发明名称 Detection of mismatch between book content and description
摘要 Techniques described herein provide for a method, system, and apparatus for determining whether a mismatch is present between content and a description associated with a literary work. In various embodiments, a matching score may be calculated based on a number of stems identified from the description that match stems identified from the content. The matching score may be adjusted based on a number of name entities identified from the description that match one or more words in the content. The computed matching score may then indicate whether the description and the content correspond to a same literary work. According to one embodiment, the description and the content correspond to the same literary work even where the computed matching score indicates a mismatch if a title is determined to match in the content. Other embodiments may be described and claimed.
申请公布号 US9262465(B1) 申请公布日期 2016.02.16
申请号 US201314133474 申请日期 2013.12.18
申请人 Amazon Technologies, Inc. 发明人 Chen Hong;Darmawan Rudy
分类号 G06F7/00;G06F17/00;G06F17/30 主分类号 G06F7/00
代理机构 Schwabe Williamson & Wyatt PC 代理人 Schwabe Williamson & Wyatt PC
主权项 1. A computer-implemented method for determining whether a mismatch is present between a description and content associated with a literary work, the method comprising: separately receiving, by a computer system from a publisher system, a title, a description, and a content associated with a literary work; identifying, by the computer system, a first plurality of words included in the description; identifying, by the computer system, a second plurality words that is included in the content; computing, by the computer system, a matching score based on initializing the matching score to zero, incrementing the matching score for each word in the first plurality of description words that matches at least one word in the second plurality of content words, and decrementing the matching score for each word in the first plurality of words that does not match any words in the second plurality of content words; determining, by the computer system, whether the title is verbatim included in the content; and notifying, by the computer system, the publisher system that the description does not match the content associated with the literary work where the computed matching score is less than zero, thereby indicating that the description does not match the content associated with the literary work, and the title is determined not to be verbatim included in the content.
地址 Reno NV US