发明名称 System and method for utilizing multiple encodings to identify similar language characters
摘要 Described herein are systems and methods for identifying the similarity between language characters. As described herein, a pair of language characters is received at a language character match engine. The language character match engine is adapted to receive encoding configuration information from each of a plurality of encoding components, and is adapted to encode the pair of language characters based on the unique structure of each language character to generate a pair of string identification characters for each encoding component. Thereafter, each pair of string identification characters is compared to one another to generate a similarity score, and the similarity score for each pair of string identification characters is combined to create a composite similarity score. The composite similarity score represents a similarity between the pair of language characters, and is used to identify the similarity between the pair of language characters.
申请公布号 US9128915(B2) 申请公布日期 2015.09.08
申请号 US201213566385 申请日期 2012.08.03
申请人 ORACLE INTERNATIONAL CORPORATION 发明人 Qian Jun;Ouaguenouni Sofiane
分类号 G06F17/28;G06F17/20;G06K9/00;G06K9/18;G06F17/22 主分类号 G06F17/28
代理机构 Tucker Ellis LLP 代理人 Tucker Ellis LLP
主权项 1. A method for improving accuracy of data matching in a middleware machine environment by identifying a similarity between language characters of a character set of a language, wherein each language character has a unique structure, the method comprising: providing a language character match engine, wherein the language character match engine executes on one or more microprocessor, wherein the language character match engine comprises a plurality of encoding components, including at least a first encoding component and a second encoding component and a third encoding component; using the language character match engine to generate a composite similarity score set for the character set of the language wherein said similarity index comprises a composite similarity score for each of a plurality of pairs of language characters of the character set of the language; wherein the composite similarity score for each of the plurality of pairs of language characters is prepared by, receiving the pair of language characters with the language character match engine,using the first encoding component to encode each language character of the pair of language characters based on the unique structure of each language character and generate, for each language character, a first-encoded string of identification characters representing the unique structure of the language character,comparing the first-encoded strings of identification characters for each of the pair of language characters to one another to generate a first-encoding similarity score for the pair of language characters,using the second encoding component to encode each language character of the pair of language characters based on the unique structure of each language character and generate, for each language character, a second-encoded string of identification characters representing the unique structure of the language character,comparing the second-encoded strings of identification characters for each of the pair of language characters to one another to generate a second-encoding similarity score for the pair of language characters,using the third encoding component to encode each language character of the pair of language characters based on the unique structure of each language character and generate, for each language character, a third-encoded string of identification characters representing the unique structure of the language character,comparing the third-encoded strings of identification characters for each of the pair of language characters to one another to generate a third-encoding similarity score for the pair of language characters, andcombining the first-encoding similarity score, the second-encoding similarity score, and the third-encoding similarity score for the pair of language characters to generate a composite similarity score for the pair of language characters.
地址 Redwood Shores CA US