摘要 |
<p><P>PROBLEM TO BE SOLVED: To highly accurately perform morphological analysis of a text document including fluctuation of description like a wrong letter and an omitted letter. <P>SOLUTION: An input text is subjected to morphological analysis to output word string data, and words having a prescribed character length out of words of the input text are approximated and collated to output an approximation dictionary collated word string data, and the word string data and the approximation dictionary collated word string data are used to correct wrong letters and omitted letters. When the wrong letters and omitted letters are corrected, prescribed weights are given on the basis of collation types of words of the word string data and the approximation dictionary collated word string data, and further, weights are given to the word string data and the approximation dictionary collated word string data in accordance with approximate character states of words of two pieces of word string data to output weighted word string data, and a statistical language model storage means are referred to about word candidates present in each position of the weighted word string data, and the maximum likelihood word string which maximizes a joint probability P<SB>weight</SB>(F, T) of a description string and a part-of-speech string considering weights given per word string is output as corrected word string data. <P>COPYRIGHT: (C)2011,JPO&INPIT</p> |