摘要 |
PROBLEM TO BE SOLVED: To make a computer recognize a sentence such as prose including colloquial expressions as a word. SOLUTION: A database storing a lot of sample word sets prepared on the basis of a lot of sample sentences is prepared. A subject constitutive word composing of the subject of a processing object sentence composed of KANA/ KANJI mixed character strings is extracted. The database is retrieved with the subject constitutive word as a keyword and the word set including this word is extracted as a subject related sample word set. It is retrieved whether the word included in the subject related sample word set is included in the character string of the processing object or not and when such a word is included, it is recognized as a word and breaks are inserted before and after that word. |