发明名称 SYSTEM AND METHOD FOR AUTOMATICALLY DISCOVERING A HIERARCHY OF CONCEPTS FROM A CORPUS OF DOCUMENTS
摘要 The invention is a method, system and computer program for automatically discovering concepts from a corpus of documents and automatically generating a labeled concept hierarchy. The method involves extraction of signatures from the corpus of documents. The similarity between signatures is computed using a statistical measure. The frequency distribution of signatures is refined to alleviate any inaccuracy in the similarity measure. The signatures are also disambiguated to address the polysemy problem. The similarity measure is recomputed based on the refined frequency distribution and disambiguated signatures. The recomputed similarity measure reflects actual similarity between signatures. The recomputed similarity measure is then used for clustering related signatures. The signatures are clustered to generate concepts and concepts are arranged in a concept hierarchy. The concept hierarchy automatically generates query for a particular concept and retrieves relevant documents associated with the concept.
申请公布号 WO03098396(A2) 申请公布日期 2003.11.27
申请号 WO2003US15563 申请日期 2003.05.15
申请人 VERITY, INC. 发明人 CHUNG, CHRISTINA;LIU, JINHUI;LUK, ALPHA;MAO, JIANCHANG;TAANK, SUMIT;VUTUKURU, VAMSI
分类号 G06F17/27;G06F17/30 主分类号 G06F17/27
代理机构 代理人
主权项
地址
您可能感兴趣的专利