摘要 |
2076526 9212493 PCTABS00014 A data string processing system uses fast algorithms for approximate string matching in a dictionary (23). Multiple spelling errors of insert, delete, change and transpose operations on character strings are considered in the disclosed fault model. S-trace, the fault model is used in formulating the algorithms and, a four-step reduction procedure improves the efficiency of an approximate string matching algorithm. These approaches to spelling correction, (i.e., using the upper bound, the string length partition criterion and the cut-off criterion) represent three improvements from the basic exhaustive comparison approach. Each can be naturally incorporated into the next step. In the fourth-step, a hashing scheme avoids comparing the given string with words at large distances when searching in the neighborhood of a small distance. An algorithm that is sub-linear to the number of words in dictionary (23) results. An application of the algorithms to a library information system uses original text files (21), information description files (22) and a negative dictionary (23) stored on disks (12).
|