发明名称 |
Automated name standardization for big data |
摘要 |
Distinct names of merchant entities in a transaction processing database are automatically corrected to standard names of entities by identifying non-standard features from the distinct names that do not uniquely identify the standard names of entities, and processing each distinct name with a selected regular expression tailored to remove the non-standard features and convent the names to a standard name format. Fuzzy matching is used to identify standard names of entities corresponding to the standard name formats. |
申请公布号 |
US9542456(B1) |
申请公布日期 |
2017.01.10 |
申请号 |
US201314145859 |
申请日期 |
2013.12.31 |
申请人 |
EMC Corporation |
发明人 |
Das Kaushik;Vawdrey Jarrod J.;Eckhardt Robert J.;Zhang Yu |
分类号 |
G06F17/30;G06F11/18 |
主分类号 |
G06F17/30 |
代理机构 |
|
代理人 |
Young Barry N. |
主权项 |
1. A computer-implemented method of automated standardization of distinct non-standard names in a transactions processing database to associate said distinct non-standard names with standard names of particular entities, comprising:
identifying features of a distinct name that are non-standard features of a standard name of an entity stored in said database, the non-standard features creating ambiguity between the distinct name and the standard name, and creating a characteristic feature set for said distinct name containing said identified non-standard features; processing said distinct name using a regular expression rule selected based upon the characteristic feature set of said distinct name to cleanse said distinct name by removing the non-standard features of the characteristic feature set from the distinct name to convert the distinct name to a standard name format; comparing the standard name format to standard names of entities in the database to determine possible matches; and identifying based upon said comparing the distinct name as corresponding to the standard name of the entity. |
地址 |
Hopkinton MA US |