发明名称 RESOLVING PAIRWISE LINKS TO GROUPS
摘要 The present invention extends to methods, systems, and computer program products for resolving pairwise links to groups. Embodiments of the invention use an iterative algorithm to transform a collection of pairwise links to groups of records that correspond to the same entity. The algorithm can be stopped after any number of iterations for an increasing accurate approximation result. The algorithm essentially guarantees a correct solution for groups of size up to the number of iterations. This algorithm scales linearly on the size of the record set, with little impact from the number of links.
申请公布号 US2014358920(A1) 申请公布日期 2014.12.04
申请号 US201313907723 申请日期 2013.05.31
申请人 Wal-Mart Stores, Inc. 发明人 Ray Andrew Benjamin;Troutman Nathaniel Philip
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A method for resolving a set of pairwise links to groups, the method comprising for one or more iterations over a plurality of records until a stop condition is satisfied, each iteration including: accessing a plurality of active neighborhood data structures, each active neighborhood data structure corresponding to a different specified record from among the plurality of records, each neighborhood data structure including a record identifier as a label, the record identifier corresponding to a record included in the plurality of records, the neighborhood data structure also including an activity state indicator and a neighborhood, the activity state indicator indicating whether or not the neighborhood data structure is active, the neighborhood including list of record identifiers for other records expressly linked to the specified record by a pairwise link, the other records also included in the plurality of records; for each neighborhood data structure, creating a message for each of the other records expressly linked to the record, the message including the label and the activity indicator for the neighborhood data structure; subsequent to creating the messages, grouping neighborhood data structures and messages by record identifier such that each group includes one neighborhood data structures and one or more messages; for each grouping of a neighborhood data structure and messages, setting the label for the neighborhood data structure to the minimum record identifier included in the neighborhood data structure and messages; and when the minimum record identifier for a grouping of a neighborhood data structure and messages is the record identifier for the specified record, removing the neighborhood data structure from plurality of active neighborhood data structure by setting the activity indicator to indicate that the neighborhood data structure is not active.
地址 Bentonville AR US