发明名称 Document search system
摘要 <p>A unique character string is extracted from an input document 907, and a similarity search is performed by using the unique character string. The extraction of the unique character string is performed by calculating and evaluating the amount of feature of a character string through comparison between appearance frequency appearing in the input document 907 and appearance frequency in a set of documents 909 to be searched. Then, the extracted unique character string is used for the search. Documents found by the search are evaluated and arranged in the order of evaluation. The similarity factor of document is evaluated by using the appearance frequency of each unique character string in the input document so that higher evaluation is provided to a document in which unique character strings with higher weight appear many times. Such a system and method do not require vocabulary information or grammatical information, which run into difficulties when meeting new words or phrases, and allow a document search to be performed against a vague request of a user for document search. <IMAGE></p>
申请公布号 EP0802492(A1) 申请公布日期 1997.10.22
申请号 EP19970302600 申请日期 1997.04.16
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 KUBOTA, RIE
分类号 G06F17/30;(IPC1-7):G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址