发明名称 |
METHOD, DEVICE, AND PROGRAM FOR LANGUAGE MODEL GENERATION AND DEVICE AND PROGRAM FOR TEXT ANALYSIS |
摘要 |
<P>PROBLEM TO BE SOLVED: To estimate symbolic chain probability (language model) by allocating all possible classes to one symbol. <P>SOLUTION: As for individual symbols of a symbolic string read out of a text database 140 having text data stored on a storage medium, a plurality of corresponding classes are found by referring to a symbol-class correspondence table 150 having symbols and a single or a plurality of classes stored on the storage medium, and their class list is generated and stored on the storage medium. Then the appearance frequency of a class chain is counted for all combinations obtained by selecting classes, one by one, from N (an integer of ≥2) class lists corresponding to N symbols which are adjacent in the read symbol string, and symbolic chain probability as a language model is generated from frequency information on class appearance chains obtained as a result of the counting. <P>COPYRIGHT: (C)2004,JPO |
申请公布号 |
JP2004069858(A) |
申请公布日期 |
2004.03.04 |
申请号 |
JP20020226575 |
申请日期 |
2002.08.02 |
申请人 |
NIPPON TELEGR & TELEPH CORP <NTT> |
发明人 |
HORI TAKAAKI;OFU KATSUTOSHI;MATSUNAGA SHOICHI |
分类号 |
G06F17/28;G10L15/06;G10L15/18 |
主分类号 |
G06F17/28 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|