发明名称 FAST APPROXIMATE STRING MATCHING ALGORITHMS FOR MULTIPLE ERRORS SPELLING CORRECTION
摘要 2076526 9212493 PCTABS00014 A data string processing system uses fast algorithms for approximate string matching in a dictionary (23). Multiple spelling errors of insert, delete, change and transpose operations on character strings are considered in the disclosed fault model. S-trace, the fault model is used in formulating the algorithms and, a four-step reduction procedure improves the efficiency of an approximate string matching algorithm. These approaches to spelling correction, (i.e., using the upper bound, the string length partition criterion and the cut-off criterion) represent three improvements from the basic exhaustive comparison approach. Each can be naturally incorporated into the next step. In the fourth-step, a hashing scheme avoids comparing the given string with words at large distances when searching in the neighborhood of a small distance. An algorithm that is sub-linear to the number of words in dictionary (23) results. An application of the algorithms to a library information system uses original text files (21), information description files (22) and a negative dictionary (23) stored on disks (12).
申请公布号 CA2076526(A1) 申请公布日期 1992.07.01
申请号 CA19912076526 申请日期 1991.12.30
申请人 GTE LABORATORIES INCORPORATED 发明人 DU, MIN-WEN;CHANG, SHIH-CHIO
分类号 G06F17/22;G06F7/04;G06F17/27;G06F17/30;G06K9/62;G06K9/72;(IPC1-7):G06F7/04;G06F15/417 主分类号 G06F17/22
代理机构 代理人
主权项
地址