发明名称 System and method for performing Unicode matching
摘要 System and method for performing Unicode matching for comparing and merging similar data objects having Unicode strings that are equivalent yet not exact matches. Unicode characters are characterized by number of strokes, stroke order, radicals, geometry, phonemes in association with input method editor and keyboard characteristics such as location of a character on an IME or keyboard (or number of GUI interface interactions used in entering the character, e.g., via tapping where “a” on a mobile device keyboard takes 1 tap of a key, “b” takes 2 taps). These characteristics associated with code points and IME's/keyboards are utilized to create subdomains for matching and determining “distance” to other Unicode code points (e.g., number of keyboard keys away). Allows for determining whether close, yet incorrect data entry may have taken place. Enables merging of duplicate data objects into master data object where minor differences or spelling errors introduce actually represent duplicate data.
申请公布号 US9275019(B2) 申请公布日期 2016.03.01
申请号 US200711963682 申请日期 2007.12.21
申请人 SAP SE 发明人 Weinberg Paul N.;Endo Richard T.;Zheng Xidong;Yospe Nathan F.;Hazi Ariel
分类号 G06F7/00;G06F17/30;G06T11/00;G06F17/22 主分类号 G06F7/00
代理机构 Buckley, Maschoff & Talwalkar LLC 代理人 Buckley, Maschoff & Talwalkar LLC
主权项 1. A method of performing Unicode matching, comprising: receiving, at a computer processor, a first data object from a first database; receiving, at the computer processor, a second data object from a second database; determining by the computer processor a first Unicode string associated with a language from a field in said first data object; determining by the computer processor a second Unicode string associated with said language from said field in said second data object; comparing by the computer processor said first Unicode string and said second Unicode string; obtaining by the computer processor a non-empty set of non-exact match code points associated with said first Unicode string and said second Unicode string; automatically comparing by the computer processor entries from said non-empty set of said non-exact match code points comprising a first code point from said first Unicode string to a second code point in said second Unicode string wherein a first characteristic associated with said first code point is utilized to compare against a second characteristic associated with said second code point to obtain a distance between said first code point and said second code point, wherein said characteristics are associated with numbers of graphical user interface interactions required to enter a single-character glyph and said distance is set to a mathematical difference of (i) graphical user interface interactions to select a first single character glyph representing said first code point and (ii) graphical user interface interactions to select a second single character glyph representing said second code point; and, reporting from the computer processor a match if said distance is zero and reporting from the computer processor a tentative match if said distance is within a non-zero threshold.
地址 Walldorf DE