发明名称 DISTRIBUTED FUZZY SEARCH AND JOIN WITH EDIT DISTANCE GUARANTEES
摘要 Methods and arrangements performing fuzzy search. A contemplated method includes: establishing an edit distance threshold for the fuzzy search; generating an index of items to be searched, via: storing at least one string; and creating substrings corresponding to the at least one string; providing a query string for use in searching; creating substrings corresponding to the query string; comparing substrings of the query string with substrings in the index; designating at least one candidate string based on said comparing; verifying whether each candidate string satisfies the edit distance threshold; and outputting at least one matching string for each candidate string that satisfies the edit distance threshold. Other variants and embodiments are broadly contemplated herein.
申请公布号 US2016217186(A1) 申请公布日期 2016.07.28
申请号 US201514603200 申请日期 2015.01.22
申请人 International Business Machines Corporation 发明人 Agarwal Manoj Kumar;Gupta Rajeev
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A method of performing a fuzzy search, said method comprising: utilizing at least one processor to execute computer code configured to perform the steps of: providing a query string for searching against an index of items to be searched, the query string comprising a continuous grouping of characters; establishing an edit distance threshold for the searching, to guide a return of search results wherein an edit distance between the query string and each search result is less than or equal to the edit distance threshold; generating the index of items to be searched, via: providing strings to be searched against, each of the strings comprising a contiguous grouping of characters; andcreating substrings corresponding to the strings to be searched against, the substrings comprising portions of the strings to be searched against; creating substrings corresponding to the query string, the substrings of the query string comprising portions of the query string; comparing substrings of the query string with substrings in the index; designating at least one candidate string based on said comparing; verifying whether each candidate string satisfies the edit distance threshold; and outputting at least one matching string for each candidate string that satisfies the edit distance threshold.
地址 Armonk NY US