发明名称 TEXT ANALYSIS DEVICE, METHOD, AND PROGRAM COPING WITH WRONG LETTER AND OMITTED LETTER
摘要 <p><P>PROBLEM TO BE SOLVED: To highly accurately perform morphological analysis of a text document including fluctuation of description like a wrong letter and an omitted letter. <P>SOLUTION: An input text is subjected to morphological analysis to output word string data, and words having a prescribed character length out of words of the input text are approximated and collated to output an approximation dictionary collated word string data, and the word string data and the approximation dictionary collated word string data are used to correct wrong letters and omitted letters. When the wrong letters and omitted letters are corrected, prescribed weights are given on the basis of collation types of words of the word string data and the approximation dictionary collated word string data, and further, weights are given to the word string data and the approximation dictionary collated word string data in accordance with approximate character states of words of two pieces of word string data to output weighted word string data, and a statistical language model storage means are referred to about word candidates present in each position of the weighted word string data, and the maximum likelihood word string which maximizes a joint probability P<SB>weight</SB>(F, T) of a description string and a part-of-speech string considering weights given per word string is output as corrected word string data. <P>COPYRIGHT: (C)2011,JPO&INPIT</p>
申请公布号 JP2011065384(A) 申请公布日期 2011.03.31
申请号 JP20090214959 申请日期 2009.09.16
申请人 NIPPON TELEGR & TELEPH CORP 发明人 SAITO KUNIKO;IMAMURA KENJI
分类号 G06F17/21;G06F17/27 主分类号 G06F17/21
代理机构 代理人
主权项
地址