发明名称 APPARATUS, METHOD AND PROGRAM FOR EXTRACTING FEATURE WORD
摘要 PROBLEM TO BE SOLVED: To efficiently extract suitable feature words corresponding to a specific category. SOLUTION: A first appearance frequency indicating the number of document data in which word pairs included in a plurality of document data concurrently occur and a second appearance frequency indicating the number of document data in which word pairs concurrently occur out of the plurality of document data to which a specified category is made to correspond are calculated. A value obtained by dividing the first appearance frequency by the second appearance frequency is calculated as a degree of concurrent occurrence. Network data using words as nodes and the degree of concurrent occurrence as an edge is generated as matrix data which are a symmetrical matrix of N×N. A maximum inherent value of the generated matrix data is calculated as a degree of aggregation. A cluster being a set of a plurality of words determined from an inherent vector corresponding to the calculated degree of aggregation is extracted. A degree of the attribution of each word to the cluster is calculated. A plurality of nodes having attribution degrees exceeding a threshold are extracted as feature words expressing a feature of the specified category. COPYRIGHT: (C)2011,JPO&INPIT
申请公布号 JP2011164791(A) 申请公布日期 2011.08.25
申请号 JP20100024718 申请日期 2010.02.05
申请人 NTT DATA CORP 发明人 MATSUNAGA TSUTOMU;SUENAGA TAKASHI
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址