摘要 |
<P>PROBLEM TO BE SOLVED: To solve the problems that it takes time to process morpheme analysis in document sorting by conventional machine learning, and sorting precision is deteriorated due to frequent false detection of a name of a person. <P>SOLUTION: An information retrieval system comprises: a characteristic token extraction means for associating the comparison conditions between a character string and a keyword with a characteristic token to extract the characteristic token from the character string in a document; a non-characteristic token extraction means for extracting a non-characteristic token, where the character string from which no characteristic token have been extracted is divided into character units; a learning means for calculating the appearance frequency of a first token train composed of a first characteristic token and a first non-characteristic token in a document for learning as a learning frequency in association with a category; and a sorting means for sorting the document to be sorted by calculating a sorting probability indicating the similarity between the appearance frequency of a second token train composed of a second characteristic token and a second non-characteristic token in a document to be sorted and the learning frequency for each category. <P>COPYRIGHT: (C)2009,JPO&INPIT |