发明名称 Automatic taxonomy merge
摘要 A method for merging two taxonomies is disclosed. Top levels of first and second taxonomies are merged. For the second taxonomy nodes are evaluated and selectively merged with nodes of the first taxonomy according to comparison of scores for these nodes with a threshold. The score for a node of the first taxonomy is a combination of one or more of a lineage quality score, Jaccard distance, string edit distance, and category depth score. After an iteration, mergings between nodes of the first and second taxonomies may be reversed if child nodes of the merged nodes were not likewise merged. Iterations may be repeated until no nodes are merged in an iteration.
申请公布号 US9152705(B2) 申请公布日期 2015.10.06
申请号 US201213659844 申请日期 2012.10.24
申请人 Wal-Mart Stores, Inc. 发明人 Lamba Digvijay Singh;Tholiya Namrata PramodKumar;Deshpande Omkar
分类号 G06F7/00;G06F17/30 主分类号 G06F7/00
代理机构 Bryan Cave LLP 代理人 Bryan Cave LLP
主权项 1. A method for merging taxonomies comprising: initializing a merged taxonomy, by a computer system, by merging at least one node of a plurality of nodes of a second taxonomy to at least one node of a plurality of nodes of a first taxonomy, the merged taxonomy comprising the first taxonomy and the second taxonomy; merging, by the computer system, the second taxonomy into the merged taxonomy by, traversing the second taxonomy from below at least one top level of the second taxonomy toward a bottom of the second taxonomy and performing: comparing, by the computer system, one or more identifiers or titles of the at least one node of the plurality of nodes of the second taxonomy to one or more identifiers or titles of the at least one node of the plurality of nodes of the first taxonomy;comparing, by the computer system, one or more lineages of one or more unmerged nodes of the plurality of nodes of the second taxonomy in the merged taxonomy to one or more lineages of one or more nodes of the plurality of nodes of the first taxonomy; andmerging, by the computer system, the at least one node of the plurality of nodes of the second taxonomy and the at least one node of the plurality of nodes of the first taxonomy in the merged taxonomy if the comparison of the one or more identifiers or titles of the at least one node of the plurality of nodes of the second taxonomy to the one or more identifiers or titles of the at least one node of the plurality of nodes of the first taxonomy and the comparison of the one or more lineages of the one or more unmerged nodes of the second taxonomy in the merged taxonomy to the one or more lineages of the one or more nodes of the plurality of nodes of the first taxonomy satisfy a threshold condition; and for each unmerged node of the plurality of nodes of the second taxonomy, hereinafter a current node of the plurality of nodes of the second taxonomy: filtering the plurality of nodes of the first taxonomy according to a Jaccard distance of titles thereof with respect to a title of the current node of the plurality of nodes of the second taxonomy to define a filtered set of nodes;calculating an edit distance between the current node of the plurality of nodes of the second taxonomy and each node of the filtered set of nodes;calculating a lineage score for each node of the filtered set of nodes according to a comparison of a lineage of the current node of the plurality of nodes of the second taxonomy in the merged taxonomy to lineages of nodes of the filtered set of nodes; andmerging the current node of the plurality of nodes of the second taxonomy with a selected node of the filtered set of nodes if a combined score of the edit distance and the lineage score for the selected node is both: a greatest combined score for the nodes of the filtered set of nodes; andthe combined score for the selected node of the filtered set of nodes satisfies a combined score threshold condition.
地址 Bentonville AR US