发明名称 SYSTEM AND METHOD FOR MATCHING DATA USING PROBABILISTIC MODELING TECHNIQUES
摘要 A system and method for matching data using probabilistic modeling techniques is provided. The system includes a computer system and a data matching model/engine. The present invention precisely and automatically matches and identifies entities from approximately matching short string text (e.g., company names, product names, addresses, etc.) by pre-processing datasets using a near-exact matching model and a fingerprint matching model, and then applying a fuzzy text matching model. More specifically, the fuzzy text matching model applies an Inverse Document Frequency function to a simple data entry model and combines this with one or more unintentional error metrics/measures and/or intentional spelling variation metrics/measures through a probabilistic model. The system can be autonomous and robust, and allow for variations and errors in text, while appropriately penalizing the similarity score, thus allowing dataset linking through text columns.
申请公布号 WO2014028860(A3) 申请公布日期 2014.05.01
申请号 WO2013US55393 申请日期 2013.08.16
申请人 OPERA SOLUTIONS, LLC 发明人 BANSAL, SHUBH
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址
您可能感兴趣的专利