发明名称 Scoring unfielded personal names without prior parsing
摘要 A system for determining a similarity between a name phrase and a comparison name phrase, for each name in the name phrase, scores the name. The scoring is based on the field frequency of the name in a name database, where the field frequency indicates a given name frequency and/or a surname frequency in the database. The system uses the scoring to determine a transition from a given name to a surname in the name phrase. The system determines a primary given name and a primary surname in the name phrase based on the scoring and the transition. The system uses the primary given name and primary surname to determine a similarity between the name phrase and a comparison name phrase, where the comparison name phrase comprises a comparison given name and a comparison surname.
申请公布号 US9535903(B2) 申请公布日期 2017.01.03
申请号 US201514685587 申请日期 2015.04.13
申请人 International Business Machines Corporation 发明人 Patman Maguire Frankie E.
分类号 G06F17/27;G06F17/28 主分类号 G06F17/27
代理机构 North Shore Patents, P.C. 代理人 North Shore Patents, P.C. ;Leonessa Lesley A.
主权项 1. A system comprising: a computing processor; and a computer readable storage medium operationally coupled to the processor, the computer readable storage medium having computer readable program code embodied therewith to be executed by the computing processor, the computer readable program code configured to: obtain, by a data object, from a name database comprising a plurality of names, a ratio of given name frequency to surname frequency for each of the plurality of names in the name database; store in the data object, a ratio of the given name frequency to a total name frequency, and a ratio of the surname frequency to the total name frequency for each of the plurality of names in the name database; for each name in the name phrase, score the name using a scoring engine, wherein the scoring is based on field frequency of the name in the name database, wherein the field frequency indicates at least one of the given name frequency and the surname frequency in the database, wherein the scoring engine obtains at least one of the given name frequency and the surname frequency from the data object; use, by the scoring engine, the scoring to determine a transition from a given name to a surname in the name phrase, wherein the transition is calculated using at least two of the ratio of the given name frequency to the total name frequency and the ratio of the surname frequency to the total name frequency, wherein the transition is calculated by the scoring engine; determine a primary given name in the name phrase based on the scoring and the transition; determine a primary surname in the name phrase based on the scoring and the transition; and use the primary given name and primary surname to assign a similarity score between the name phrase and a comparison name phrase, wherein the comparison name phrase comprises a comparison given name and a comparison surname.
地址 Armonk NY US