发明名称 DOCUMENT CLASSIFICATION DEVICE AND PROGRAM
摘要 PROBLEM TO BE SOLVED: To classify documents that are useful for analysis and retrieval.SOLUTION: A similar document group detecting unit 12 extracts a word that becomes a feature in constituting a sentence from each of a plurality of pieces of document data, calculates a document-specific feature value, consolidates documents having the same feature value as a similar document group and records them in a similar document group recording unit 22. If even one author exists who corresponds to an author blacklist DB 24 in which an author having a history of having published a plurality of similar documents is registered, from among the authors of documents contained in the similar document group, a blacklist determination unit 14, for each similar document group recorded in the similar document group recording unit 22, records each document contained in the similar document group in a sorting DB 26 which accumulates documents which are not used for analysis. A sorting determination unit 16 determines whether to accumulate each document in the sorting DB 26 or to accumulate them in a DB 28 for analysis which accumulates documents used for analysis based on the number of words extracted, the number of overlapping documents, the number of authors, and the number of blog services.
申请公布号 JP2013235369(A) 申请公布日期 2013.11.21
申请号 JP20120106682 申请日期 2012.05.08
申请人 NIPPON TELEGR & TELEPH CORP <NTT> 发明人 TAKAHASHI YAMATO;SUGIZAKI MASAYUKI;UCHIYAMA MASASHI
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址
您可能感兴趣的专利