发明名称 Name search using multiple bitmap distributions
摘要 Provided are a computer implemented method, computer program product, and system for matching names. For a first bitmap distribution, it is determined whether a first bitmap signature of a query name and a second bitmap signature of a target name have a number of character n-grams overlapping that meet or exceed a threshold to generate a first preliminary value. For a second bitmap distribution that is different from the first bitmap distribution, it is determined whether a third bitmap signature of the query name and a fourth bitmap signature of the target name have a number of character n-grams overlapping that meet or exceed a threshold to generate a second preliminary value. The first preliminary value and the second preliminary value are combined, and, if the combination results in a value of true, it is determined that the query name and the target name are to be further processed.
申请公布号 US9020911(B2) 申请公布日期 2015.04.28
申请号 US201213353252 申请日期 2012.01.18
申请人 International Business Machines Corporation 发明人 Biesenbach David E.;Liddle Steven J.;Watjen Stephen J.;Williams Charles K.
分类号 G06F7/00;G06F17/30 主分类号 G06F7/00
代理机构 Konrad, Raynes, Davda and Victor LLP 代理人 Davda Janaki K.;Konrad, Raynes, Davda and Victor LLP
主权项 1. A computer program product for matching names, the computer program product comprising: a non-transitory computer readable storage medium having computer readable program code embodied therein, wherein the computer readable program code, when executed by a processor of a computer, is configured to perform operations of: creating a first bitmap distribution of character n-grams distributed into bitmap positions in descending order of frequency of occurrence of the character n-grams in a set of names based on bitmap positions with a lowest cumulative frequency, wherein at least two distinct character n-grams are assigned to a same bitmap position of the bitmap positions; creating a second bitmap distribution of the character n-grams distributed into the bitmap positions so that the at least two distinct character n-grams are assigned to different bitmap positions and so that any overlapping character n-grams in the first bitmap distribution do not overlap in the second bitmap distribution; using the first bitmap distribution, determining whether a first bitmap signature of a query name and a second bitmap signature of a target name in a set of names have a number of character n-grams overlapping that meet or exceed a first configurable threshold to generate a first preliminary value; using the second bitmap distribution, determining whether a third bitmap signature of the query name and a fourth bitmap signature of the target name have a number of character n-grams overlapping that meet or exceed a second configurable threshold to generate a second preliminary value; and in response to determining that a logical operation applied to the first preliminary value and the second preliminary value results in a value of true, determining that the query name and the target name are to be processed for further comparisons.
地址 Armonk NY US