发明名称 DOCUMENT CLUSTER EXTRACTION DEVICE AND METHOD
摘要 PROBLEM TO BE SOLVED: To extremely reduce the deviation of the size of an extracted cluster at the time of extracting a document cluster. SOLUTION: A target document group to be inputted by a target document input part 1 is designated by a user. A word extraction part 2 performs morphemic analysis processing to the text data of an inputted document, and calculates a word appearance frequency and a document appearance frequency. An inter-document relevance calculation part 3 calculates relevancy by using the word vector of the document. A hierarchical cluster analysis part 4 is configured according to the technique of a general hierarchical cluster analysis to gather the cluster hierarchies of the document by using the relevancy. A cluster extraction part 5 evaluates and selects the cluster hierarchy by using a predetermined rule, and extracts the desired number of clusters from the selected cluster hierarchy. COPYRIGHT: (C)2005,JPO&NCIPI
申请公布号 JP2005063157(A) 申请公布日期 2005.03.10
申请号 JP20030292776 申请日期 2003.08.13
申请人 FUJI XEROX CO LTD 发明人 UMEKI HIROSHI;KOYAMA TAKEHIRO
分类号 G06F17/30;(IPC1-7):G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址
您可能感兴趣的专利