摘要 |
PROBLEM TO BE SOLVED: To provide a method for generation of a morphological analyzer, which can be performed to the estimation of a part of speech by learning without a teacher.SOLUTION: A method includes: a step of generating and storing NPYLM showing the probability in which a following substring appears as being subject to a certain substring, by using two or more sentences stored in a learning data memory unit; and a step of reading a sentence from the learning data memory unit, estimating the most probable space between words by using CRF introducing a feature function using the argument of a latent variable representing a part of speech of each substring and an appearance probability of the substring calculated in NPYLM, updating parameters of the CRF using the space between words found by Blocked Gibbs sampling from the end of a sentence toward the beginning of the sentence as teacher data, and repeating the processing updating the NPYLM based on the space between words until satisfying convergence conditions. The sentence in which the space between words and the parameters of the CRF are updated is learned again after eliminating the substring constituting the space between words found last time and its connection information from the NPYLM. |