摘要 |
<p>A unique character string is extracted from an input document 907, and a similarity search is performed by using the unique character string. The extraction of the unique character string is performed by calculating and evaluating the amount of feature of a character string through comparison between appearance frequency appearing in the input document 907 and appearance frequency in a set of documents 909 to be searched. Then, the extracted unique character string is used for the search. Documents found by the search are evaluated and arranged in the order of evaluation. The similarity factor of document is evaluated by using the appearance frequency of each unique character string in the input document so that higher evaluation is provided to a document in which unique character strings with higher weight appear many times. Such a system and method do not require vocabulary information or grammatical information, which run into difficulties when meeting new words or phrases, and allow a document search to be performed against a vague request of a user for document search. <IMAGE></p> |