发明名称 Analysis of a system for matching data records
摘要 Embodiments disclosed herein provide a system and method for analyzing an identity hub. Particularly, a user can connect to the identity hub, load an initial set of data records, create and/or edit an identity hub configuration locally, analyze and/or validate the configuration via a set of analysis tools, including an entity analysis tool, a data analysis tool, a bucket analysis tool, and a linkage analysis tool, and remotely deploy the validated configuration to an identity hub instance. In some embodiments, through a graphical user interface, these analysis tools enable the user to analyze and modify the configuration of the identity hub in real time while the identity hub is operating to ensure data quality and enhance system performance.
申请公布号 US8799282(B2) 申请公布日期 2014.08.05
申请号 US200812239448 申请日期 2008.09.26
申请人 International Business Machines Corporation 发明人 Goldenberg Glenn;Schumacher Scott;Woods Jason
分类号 G06F7/00;G06F17/30 主分类号 G06F7/00
代理机构 Edell, Shapiro & Finnan, LLC 代理人 Carroll Terry;Edell, Shapiro & Finnan, LLC
主权项 1. A computer-implemented method for analyzing a system for matching data records, the method comprising: producing a configuration of said system, the configuration of the system applying a bucketing strategy operable to create buckets by comparing sets of one or more attributes of initial data records with corresponding attributes of candidate data records in said system, wherein each bucket is associated with a corresponding set of attributes; analyzing buckets created according to the bucketing strategy associated with said configuration of said system, wherein said buckets each comprise candidate data records with the corresponding set of attributes similar to those of the initial data records and are used to associate data records with a common entity, and wherein said analyzing said buckets further comprises analyzing statistics associated with said buckets, analyzing a bucket size distribution, analyzing said buckets by size, analyzing said buckets by composition, analyzing a bulk cross match, comparison distribution, analyzing members by bucket count, analyzing member bucket values, analyzing member bucket frequencies, analyzing a member comparison distribution, or a combination thereof; analyzing an effect of said buckets on performance of said system to determine and link data records associated with a common entity; and changing said bucketing strategy accordingly to alter determination of the association of data records with the common entity.
地址 Armonk NY US