摘要 |
<p><P>PROBLEM TO BE SOLVED: To provide a compound word break estimation device, method and program for estimating whether a word registered in a dictionary in advance as well as a word not registered is a compound word or not, and for estimating a proper break position when the word is the compound word. <P>SOLUTION: This compound word break estimation device is provided with: a learning data storage part for storing information showing whether or not each of a plurality of words is a compound word configured of a plurality of morphemes and a break position between the plurality of morphemes configuring the compound word in the case of a compound word; a similarity calculation part for calculating similarity between the vector of an unknown word vectorized by using the featured value of each of characters included in the word by a vectorization processing part and each of vectors of the known words stored in a plurality of learning data storage parts; and an estimation part for estimating whether or not the unknown word is the compound word, and for estimating the break position between the morphemes of the unknown word being a compound word on the basis of the similarity. <P>COPYRIGHT: (C)2010,JPO&INPIT</p> |