摘要 |
<p><P>PROBLEM TO BE SOLVED: To solve the problem that the corpus of data holding large amounts of information makes it difficult to find out relevant information, and that while assigning tags to documents makes it easy to search relevant information, a conventional document tag assignment method may not be effective for finding of information in some cases. <P>SOLUTION: In one embodiment, modeling topics includes accessing a corpus comprising documents that include words. Words of a document are selected as keywords of the document. The documents are clustered according to the keywords, where each cluster corresponds to a topic. A statistical distribution is generated for a cluster from words of the documents of the cluster. A topic is modeled by using the statistical distribution generated for the cluster corresponding to the topic. <P>COPYRIGHT: (C)2009,JPO&INPIT</p> |