发明名称 DOCUMENT PROCESSING PROGRAM AND DOCUMENT PROCESSOR
摘要 <P>PROBLEM TO BE SOLVED: To improve the classifying precision for a document. Ž<P>SOLUTION: A keyword extraction part 311 extracts a keyword based on the appearance frequency of a character string in the document for every document stored in a document storage part 22. An object sentence extraction part 312 extracts a summary sentence including the extracted keyword from the document from which the keyword has been extracted. A paraphrastic sentence generation part 322 generates a paraphrastic sentence including the keyword based on the keyword included in the extracted summary sentence and the modification analytic result of the summary sentence. An identity extraction part 42 extracts the set of identifies including the keyword included in the paraphrastic sentence generated by the paraphrastic sentence generation part 322, and stores the set of identifies in an identity storage part 26. A document vector generation part 442 generates a document vector based on a document vector component value showing the appearance frequency of the set of identities stored in the identity storage part 26 in the summary sentence extracted from each document stored in the document storage part 22. Ž<P>COPYRIGHT: (C)2010,JPO&INPIT Ž
申请公布号 JP2010160645(A) 申请公布日期 2010.07.22
申请号 JP20090001851 申请日期 2009.01.07
申请人 TOSHIBA CORP;TOSHIBA SOLUTIONS CORP 发明人 KURATA SAORI;SAITO YOSHIMI;KANO TOSHIYUKI
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址