发明名称 TEXT CLASSIFICATION PROGRAM
摘要 PROBLEM TO BE SOLVED: To precisely classify a text to be classified. SOLUTION: A function word/content word splitting means 1a splits a text A2 to be classified into a function word and a content word. An N-gram means 1b performs N-gram in which an N is changed step by step at each function word and content word. A feature vector generation means 1c generates a function word feature vector and a content word feature vector at each N-gram. An area determining means 1f determines to which of a procedure area and a non-procedure area of a classification model 1e each of the feature vectors belongs. A classification means 1g classifies whether each of the feature vectors indicates or not the procedure of the text A2 to be classified by using an evaluation reference which takes a high evaluation value when the classification performance due to the function word feature vector is enhanced or when the classification performance due to the content feature vector deteriorates and takes a low evaluation value when the classification performance due to the function word feature vector deteriorates or when the classification performance due to the content feature vector is enhanced, as the N is increased. COPYRIGHT: (C)2005,JPO&NCIPI
申请公布号 JP2004348239(A) 申请公布日期 2004.12.09
申请号 JP20030142007 申请日期 2003.05.20
申请人 FUJITSU LTD 发明人 TAKECHI MINEKI
分类号 G06F17/30;(IPC1-7):G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址