摘要 |
A token is attached to a word class sequence whose probability of appearance in text data is equal to or more than a predetermined value. A set of words and tokens included in a word/token sequence concerning the text data, is separated so that a probability of generation of the word/token sequence concerning the text data becomes the highest. The token is then replaced with a phrase included in the text data.
|