发明名称 Supplier deduplication engine
摘要 Disclosed herein is a method of grouping similar supplier names together in a database. The syntactical errors in the supplier names are corrected. The supplier names are grouped after correcting the syntactical errors. The abbreviations in the supplier names are captured. The ordering, pronunciation and stemming errors in the supplier names are corrected. A matching algorithm that matches and compares two supplier names is applied that comprises the steps of grouping supplier names based on first set of characters in the supplier names and calculating a matching score between the two supplier using Levenshtein distance between the two supplier names, along with the supplier names' sound codes obtained from a modified metaphone algorithm, length of each word, position of matching and mismatching characters, and stem of words in the supplier names. The matching scores are compared with set thresholds in order to further group the supplier names into clusters.
申请公布号 US8234107(B2) 申请公布日期 2012.07.31
申请号 US20080029519 申请日期 2008.02.12
申请人 GOYAL RAM DAYAL;KETERA TECHNOLOGIES, INC. 发明人 GOYAL RAM DAYAL
分类号 G06F17/28 主分类号 G06F17/28
代理机构 代理人
主权项
地址