摘要 |
PROBLEM TO BE SOLVED: To provide a similar document retrieval device and a similar document retrieval method and program capable of discovering a document where not only input retrieval character strings coincide but similar tags are attached or a document where a similar keyword is used in retrieving a similar document from a large amount of documents stored in a text format. SOLUTION: In a similar document retrieval device for retrieving documents, with respect to a normalized document to be a document which normalizes words included in an input document, stores words for which a tag is attached to the document, and normalizes words in the document, a tag is generated in the document according to a setting file, an index where a word to be a clue for retrieving a similar document matches a document ID is generated, statistical information to be statistical information on names of attributes included in the document stored in the storage means is generated, and the document is retrieved on the basis of the statistical information. COPYRIGHT: (C)2009,JPO&INPIT
|