发明名称 Method and system for determining sets of variant items
摘要 Various embodiments of a method and system for determining sets of variant items are described. Various embodiments may include a system configured to generate multiple item pairs each corresponding to a particular item and another item determined to be similar to the particular item. For the particular item and the other item, each item pair may include a respective sequence of text strings (e.g., a title). For each item pair, the system may perform a corresponding text alignment and determine one or more misalignments of the item pair. The system may also assign a similarity score to each item pair; the similarity score may be dependent on the misalignment(s) determined for the particular item pair. Based on each aligned item pair and the similarity score assigned to that aligned item pair, the system may generate an indication specifying that each of a set of items are variants of each other.
申请公布号 US9418138(B2) 申请公布日期 2016.08.16
申请号 US201514850934 申请日期 2015.09.10
申请人 Amazon Technologies, Inc. 发明人 Kalinin Alexander Y.;Roy Chowdhury Amber;Kumar Vijay
分类号 G06F17/30;G06F19/24;G06F19/18;G06F19/28 主分类号 G06F17/30
代理机构 Meyertons, Hood, Kivlin, Kowert & Goetzel, P.C. 代理人 Kowert Robert C.;Meyertons, Hood, Kivlin, Kowert & Goetzel, P.C.
主权项 1. A computer-implemented method, comprising: performing, by one or more computers having at least one processor and memory: accessing data that includes, for each individual item of individual ones of a plurality of items, a corresponding one or more text strings that describe, and are distinct from, the individual item;for each particular item of at least some of the individual ones of the plurality of items: for each of one or more other items of the individual ones of the plurality of items that are distinct from the particular item, comparing the one or more text strings describing the other item with the one or more text strings describing the particular item;based at least in part on said comparing, identifying at least one of the one or more other items that are each distinct from, but a potential variant of, the particular item;subsequent to said identifying, and for each identified other item of at least some of the one or more identified other items, generating an aligned pair, wherein one member of the aligned pair comprises the one or more text strings describing the identified other item, and the other member of the aligned pair comprises the one or more text strings describing the particular item, and wherein said generating comprises aligning the text in the one member with respect to the text in the other member; andsubsequent to said generating, and for each aligned pair of at least some of the one or more aligned pairs: determining one or more misalignments between the text in one member of the aligned pair and the text in the other member of the aligned pair; andassigning a similarity score to the aligned pair, wherein the similarity score depends at least in part on the determined one or more misalignments, and indicates a degree of confidence that the particular item that corresponds to the one or more text strings of one member of the aligned pair, and the other item that corresponds to the one or more text strings of the other member of the aligned pair, are distinct variants of each other;based at least in part on a plurality of the generated aligned pairs, and on the similarity scores assigned to each of those aligned pairs, determining one or more variant sets of items from the plurality of items, wherein each variant set comprises multiple items of the plurality of items such that each item of the variant set is indicated to be a variant of a same item;generating a network-based page based at least in part on the determined one or more variant sets of items; andtransmitting the generated network-based page over a communication network to a client computing device.
地址 Reno NV US
您可能感兴趣的专利