摘要 |
PROBLEM TO BE SOLVED: To eliminate fuzziness that can not be solved by a minimum cost method by calculating a morpheme analysis result according to the normalization frequency in a corpus for an input character string, and extracting an analysis result through a word dictionary, independent word retrieval, an adjunct retrieval, a connection table, connection examination, unknown word segmentation, and analysis table generation. SOLUTION: A normalization frequency calculation part 11 calculates the normalization frequency in a corpus consisting of a large amount of electronic document information for an input character string 13. According to this normalization frequency, a cost calculation part 2 calculates the certainty of a morpheme analysis result and an independent word retrieval part 3 obtains grammatical information on an independent word by referring to the word dictionary 4. Further, an adjunct retrieval part 5 obtains grammatical information on an adjunct by referring to the word dictionary 4. Further, a connection examination part 6 examines a connection between morphemes by referring to the connection table 7 and an unknown word segmentation part 9 segments an unknown word candidate character string and adds it to independent word candidates. The analysis table is generated through those processes and an analysis result is extracted and outputted. |