发明名称 METHOD AND SYSTEM FOR EXTRACTING CHARACTERISTIC STRING, METHOD AND SYSTEM FOR SEARCHING FOR RELEVANT DOCUMENT USING THE SAME, STORAGE MEDIUM FOR STORING CHARACTERISTIC STRING EXTRACTION PROGRAM, AND STORAGE MEDIUM FOR STORING RELEVANT DOCUMENT SEARCHING PROGRAM
摘要 A method for extracting features in contents of a document without using a word dictionary and a system using the method for accurately searching for a relevant document or documents at high speed. The method includes steps of storing character strings present in a text in a text database and possibilities appearing at boundaries of words in the text in the form of an occurrence probability file, storing occurrence frequencies of the character strings in the text as an occurrence frequency file, extracting characteristic strings from a text spcified by a user with use of the occurrence probability file, and counting occurrence frequencies thereof in the user-specified text. The method calculates similarities to the user-specified text with use of the occurrence frequency file and the occurrence frequencies in the user-specified text.
申请公布号 US6473754(B1) 申请公布日期 2002.10.29
申请号 US19990320558 申请日期 1999.05.27
申请人 HITACHI, LTD. 发明人 MATSUBAYASHI TADATAKA;TADA KATSUMI;OKAMOTO TAKUYA;SUGAYA NATSUKO;KAWASHIMO YASUSHI
分类号 G06F17/27;G06F17/30;(IPC1-7):G06F17/30 主分类号 G06F17/27
代理机构 代理人
主权项
地址