发明名称 |
DISTRIBUTED FUZZY SEARCH AND JOIN WITH EDIT DISTANCE GUARANTEES |
摘要 |
Methods and arrangements performing fuzzy search. A contemplated method includes: establishing an edit distance threshold for the fuzzy search; generating an index of items to be searched, via: storing at least one string; and creating substrings corresponding to the at least one string; providing a query string for use in searching; creating substrings corresponding to the query string; comparing substrings of the query string with substrings in the index; designating at least one candidate string based on said comparing; verifying whether each candidate string satisfies the edit distance threshold; and outputting at least one matching string for each candidate string that satisfies the edit distance threshold. Other variants and embodiments are broadly contemplated herein. |
申请公布号 |
US2016217186(A1) |
申请公布日期 |
2016.07.28 |
申请号 |
US201514603200 |
申请日期 |
2015.01.22 |
申请人 |
International Business Machines Corporation |
发明人 |
Agarwal Manoj Kumar;Gupta Rajeev |
分类号 |
G06F17/30 |
主分类号 |
G06F17/30 |
代理机构 |
|
代理人 |
|
主权项 |
1. A method of performing a fuzzy search, said method comprising:
utilizing at least one processor to execute computer code configured to perform the steps of: providing a query string for searching against an index of items to be searched, the query string comprising a continuous grouping of characters; establishing an edit distance threshold for the searching, to guide a return of search results wherein an edit distance between the query string and each search result is less than or equal to the edit distance threshold; generating the index of items to be searched, via:
providing strings to be searched against, each of the strings comprising a contiguous grouping of characters; andcreating substrings corresponding to the strings to be searched against, the substrings comprising portions of the strings to be searched against; creating substrings corresponding to the query string, the substrings of the query string comprising portions of the query string; comparing substrings of the query string with substrings in the index; designating at least one candidate string based on said comparing; verifying whether each candidate string satisfies the edit distance threshold; and outputting at least one matching string for each candidate string that satisfies the edit distance threshold. |
地址 |
Armonk NY US |