发明名称 Automated name standardization for big data
摘要 Distinct names of merchant entities in a transaction processing database are automatically corrected to standard names of entities by identifying non-standard features from the distinct names that do not uniquely identify the standard names of entities, and processing each distinct name with a selected regular expression tailored to remove the non-standard features and convent the names to a standard name format. Fuzzy matching is used to identify standard names of entities corresponding to the standard name formats.
申请公布号 US9542456(B1) 申请公布日期 2017.01.10
申请号 US201314145859 申请日期 2013.12.31
申请人 EMC Corporation 发明人 Das Kaushik;Vawdrey Jarrod J.;Eckhardt Robert J.;Zhang Yu
分类号 G06F17/30;G06F11/18 主分类号 G06F17/30
代理机构 代理人 Young Barry N.
主权项 1. A computer-implemented method of automated standardization of distinct non-standard names in a transactions processing database to associate said distinct non-standard names with standard names of particular entities, comprising: identifying features of a distinct name that are non-standard features of a standard name of an entity stored in said database, the non-standard features creating ambiguity between the distinct name and the standard name, and creating a characteristic feature set for said distinct name containing said identified non-standard features; processing said distinct name using a regular expression rule selected based upon the characteristic feature set of said distinct name to cleanse said distinct name by removing the non-standard features of the characteristic feature set from the distinct name to convert the distinct name to a standard name format; comparing the standard name format to standard names of entities in the database to determine possible matches; and identifying based upon said comparing the distinct name as corresponding to the standard name of the entity.
地址 Hopkinton MA US