发明名称 Record linkage sharing using labeled comparison vectors and a machine learning domain classification trainer
摘要 Herein disclosed is a system and method for record linkage that uses machine learning to link records, so that many users can contribute their training data to a shared repository and employ the accumulated training data without any user having to share their actual data. The system includes a record linkage server, which further includes a record linkage repository, a domain classifier, and a domain classification trainer. The record linkage server is connected with a record linkage client, which includes a field comparator and a manual label prompter. Further disclosed is a method for record linkage, describing how two structured data sets can be matched, including searching domains, loading data sets, loading domain, matching fields, iterating record linking for all record pairs, including: selecting record pair, calculating comparison vector, calculating label probabilities, determining label, optionally setting label manually, updating prior probabilities, optionally confirming selected label, and updating training data.
申请公布号 US9576248(B2) 申请公布日期 2017.02.21
申请号 US201414203784 申请日期 2014.03.11
申请人 Hurwitz Adam M. 发明人 Hurwitz Adam M.
分类号 G06F15/18;G06N99/00;G06F17/30 主分类号 G06F15/18
代理机构 IDP Patent Services 代理人 IDP Patent Services ;Underdal Olav M.
主权项 1. A system for sharing record linkage information, comprised of: a) a record linkage server, wherein the record linkage server is further comprised of: a linkage services component, which is configured to processes domain classifications;a record linkage repository, wherein the record linkage repository stores in non-transitory server memory: shared record linkage training data from a plurality of users and information for a set of domains, wherein each domain comprises a set of fields, such that each field comprises a comparator function and a set of labeled comparison vectors;wherein the linkage services component communicates with the record linkage repository in order to store, access and process data stored by the linkage services repository; b) a record linkage client, wherein the record linkage client communicates with the record linkage server over a network, to processes record linkage functions for a user; wherein the user employs the record linkage system, communicating via the record linkage client, to perform record linking on a first data set and a second data set, wherein both the first and second data sets have fields, which are matched to a domain stored in the record linkage repository; wherein further the record linkage client calls the linkage services component on the record linkage server, such that the linkage services component calculates a comparison vector and via a classification function performs a sensitivity analysis to determine the label of the comparison vector; wherein the linkage services component is further configured with a domain classification trainer, such that the domain classification trainer is configured to train classification functions with the training data for each domain, stored in the record linkage repository, via a method of machine learning; such that the plurality of users share record linkage domain information stored in the record linkage repository, and perform record linking via the domain classifier of the record linkage server, such that the plurality of users do not have access to the shared record linkage training data; whereby a user is enabled to perform record linking and add information for a domain to the training data, such that the training data is used to train the classification functions for the domain.
地址 New York NY US