摘要 |
PROBLEM TO BE SOLVED: To classify documents that are useful for analysis and retrieval.SOLUTION: A similar document group detecting unit 12 extracts a word that becomes a feature in constituting a sentence from each of a plurality of pieces of document data, calculates a document-specific feature value, consolidates documents having the same feature value as a similar document group and records them in a similar document group recording unit 22. If even one author exists who corresponds to an author blacklist DB 24 in which an author having a history of having published a plurality of similar documents is registered, from among the authors of documents contained in the similar document group, a blacklist determination unit 14, for each similar document group recorded in the similar document group recording unit 22, records each document contained in the similar document group in a sorting DB 26 which accumulates documents which are not used for analysis. A sorting determination unit 16 determines whether to accumulate each document in the sorting DB 26 or to accumulate them in a DB 28 for analysis which accumulates documents used for analysis based on the number of words extracted, the number of overlapping documents, the number of authors, and the number of blog services. |