发明名称 APPROXIMATE STRING MATCHING OPTIMIZATION FOR A DATABASE
摘要 Software for processing a database query that includes: (i) receiving a query of a database including a search value; (ii) determining a distance between the search value and at least one reference value; (iii) determining a maximum distance from the search value to be used in searching a plurality of datasets of the database, wherein the maximum distance from the search value defines a search range and is based, at least in part, on the determined distance between the search value and the at least one reference value; (iv) determining a subset of datasets from the plurality of datasets that includes datasets for which a data range with respect to each reference value overlaps with the search range; and (v) performing approximate string matching for the search value on the subset of datasets.
申请公布号 US2017124147(A1) 申请公布日期 2017.05.04
申请号 US201514926119 申请日期 2015.10.29
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 BODZIONY MICHAL;GAZA LUKASZ;GRUSZECKI ARTUR M.;KAZALSKI TOMASZ;SKIBSKI KONRAD K.;STRADOMSKI TOMASZ
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A computer-implemented method comprising: receiving, by one or more processors, a query of a database, wherein the query includes a search value, and wherein the database includes a plurality of datasets; determining, by one or more processors, a distance between the search value and at least one reference value; determining, by one or more processors, a maximum distance from the search value to be used in searching the database, wherein the maximum distance from the search value defines a search range and is based, at least in part, on the determined distance between the search value and the at least one reference value; determining, by one or more processors, a subset of datasets from the plurality of datasets that includes datasets for which a data range with respect to each reference value overlaps with the search range; and performing, by one or more processors, approximate string matching for the search value on the subset of datasets; wherein: each dataset of the plurality of datasets is assigned a minimum distance and a maximum distance between values of dataset entries and the at least one reference value; and the minimum distance and the maximum distance for each dataset define the data range for the respective dataset with respect to the at least one reference value.
地址 ARMONK NY US