发明名称 METHOD AND SYSTEM FOR LINKING HETEROGENEOUS DATA SOURCES
摘要 A method for linking records (related to an entity) from separate databases may include extracting a first record from a first database as a first vector, extracting a second record from a second database as a second vector, generating first and second sub-vectors for the first and second vectors, where each sub-vector includes quality features from the respective vector, pre-processing the first and second sub- vectors using domain knowledge, calculating a distance assessment classifier based on the first and second sub-vectors, and determining whether the distance represented by the distance assessment classifier is greater than a threshold. If the distance is greater than the threshold, the records may be linked; if not, the method extracts additional records and repeats after generating first and second sub-vectors until the distance is greater than the threshold. A system for linking records is also disclosed.
申请公布号 WO2016099578(A1) 申请公布日期 2016.06.23
申请号 WO2014US72357 申请日期 2014.12.24
申请人 MEDIDATA SOLUTIONS, INC. 发明人 TERESHKOV, VLADIMIR;HAIDER, SYED;AIMALE, VALERIO;HARTMAN, JOSHUA;BOUND, CHRISTOPHER;KATRIEL, RON
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址