摘要 |
PROBLEM TO BE SOLVED: To solve the problem in a method for addressing the low language likelihood accuracy of a word string with a small amount of learning data, that a morpheme string is classified has the disadvantage that the language constraint is weak compared with the word N-gram, and in a method in which the word N-gram is applied to the low order hierarchy, that the class N-gram being the high order hierarchy, has the disadvantage that linkage statistics of a word in the low order hierarchy and a word in the high order hierarchy cannot be estimated with a high degree of reliability because the word N-gram in the low order hierarchy is integrated as the class N-gram in the high order hierarchy. SOLUTION: An N-gram language model creation device for creating an N-gram language model by morphemes and classes from a corpus includes a first corpus that is partially grouped by morphemes and classes, a second corpus in which linkage examples of a set of morphemes that belong to a class are described by a morpheme string, and word sequence development means for embedding/developing a morpheme string of the second corpus to the classification string of the first corpus. COPYRIGHT: (C)2009,JPO&INPIT |