发明名称 Method and system for approximate string matching
摘要 Approximate string matching of a target string to a trie data structure in which the trie data structure has a root node and generations of child nodes, each node representing at least one character in an alphabet to provide a lexicon of words and word fragments. Traversing the trie data structure includes starting from the root node by comparing each node of a branch of the trie data structure to characters in the target string and adding characters traversed in a branch of the trie data structure to a gathered string to provide suggestions of approximate matches. If a node is reached that is flagged as a node for a word or a word fragment and, and if the target string is longer than the gathered string, the method loops back to the root node, and continues the traverse from the root node. At each node, the system determines if there is a correction rule for one or more characters in the remainder of the target string from the current node, and if so, applies the correction rule to the target string to obtain a modified target string.
申请公布号 US9251294(B2) 申请公布日期 2016.02.02
申请号 US201012830345 申请日期 2010.07.04
申请人 International Business Machines Corporation 发明人 Nevidomski Alexei;Volkov Pavel
分类号 G06F17/30 主分类号 G06F17/30
代理机构 BainwoodHuang 代理人 BainwoodHuang
主权项 1. A method of approximate string matching of a target string to a trie data structure, the trie data structure having a root node and generations of child nodes, each node representing at least one character in an alphabet, the method comprising: traversing the trie data structure starting from the root node by comparing each node of a branch of the trie data structure to at least one character in the target string; determining, at each node, if there is a correction rule for one or more characters in the remainder of the target string from the current node, and, if so, applying the correction rule to the target string to modify the target string to obtain a modified target string, wherein applying the correction rule includes performing a sequence to sequence character substitution on the target string to obtain the modified target string, and continuing to traverse the trie data structure from the current node using the modified target string, wherein no additional modifications of the modified target string are allowed within its modified parts; and providing, responsive to the traversing, at least one suggestion from the trie data structure.
地址 Armonk NY US