发明名称 Method and Apparatus for Matching Misspellings Caused by Phonetic Variations
摘要 A method and apparatus for matching equivalent words across languages takes advantage of a set of rules that are built from a user-defined language specification (UDLS), which may be open source and customizable by a language expert. The UDLS is used to build a customer language library (CLL) that includes a list of consonants, a list of vowels, and rules defining phoneme equivalencies across two languages. The CLL is used to match equivalent words by both two-set and three-set matching to not only increase the number of true matches (i.e., overall accuracy), but also improve recognition of variations in a manner that is not language specific.
申请公布号 US2015066474(A1) 申请公布日期 2015.03.05
申请号 US201414333578 申请日期 2014.07.17
申请人 Acxiom Corporation 发明人 Yi Gon;Miyahira Aaron;Marupally Pavan
分类号 G06F17/27 主分类号 G06F17/27
代理机构 代理人
主权项 1. A computer-implemented method for matching terms, comprising the steps of: a. receiving at a processor in communication with a computer-readable medium a first term and a second term, wherein each of the first term and second term comprises a character string stored on the computer-readable medium; b. tokenizing at the processor the first term and the second term to create a first tokenized set comprising a plurality of first tokens from the first term and a second tokenized set comprising a plurality of second tokens from the second term, wherein each of the first tokens and second tokens comprises at least one consonant or consonant placeholder, and at least one vowel or vowel placeholder; c. comparing at the processor each first token from the first tokenized set with a corresponding second token from the second tokenized set to determine if the first tokenized set comprises an equal number of tokens as the second tokenized set; d. if the first tokenized set comprises an equal number of tokens as the second tokenized set, comparing the characters in each of the first tokens in the first tokenized set to the characters in the corresponding second token from the second tokenized set to determine if a match exists between the first term and the second term, wherein said comparison step is performed using a first compiled language library (CLL) comprising a set of consonants, a set of vowels, and a plurality of consonant equivalencies and vowel equivalencies whereby a match exists if the characters in each of the first tokens in the first tokenized set are identical to the characters in the corresponding second token from the second tokenized set or if the first tokens in the first tokenized set are equivalent to the characters in the corresponding second token from the second tokenized set; and e. outputting from the processor an indicator of whether a match has occurred.
地址 Little Rock AR US