发明名称 Duplicate entry detection system and method
摘要 A computer system and method for determining whether the subject matter described in a received document is substantially similar to the subject matter of other documents in a document corpus, such that the received document can be considered a duplicate document. After receiving a first document, a set of tokens for the first document is generated. A non-fielded relevance search on a token index is executed. The relevance search returns a set of candidate duplicate documents with scores corresponding to each candidate document. For each candidate document with a score above a threshold, filtering is performed on each candidate document to determine whether each candidate document is a true duplicate of the first document. A set of candidate documents with a score above the threshold that were not disqualified as candidate documents is then provided.
申请公布号 US8046372(B1) 申请公布日期 2011.10.25
申请号 US20070754237 申请日期 2007.05.25
申请人 AMAZON TECHNOLOGIES, INC. 发明人 THIRUMALAI SRIKANTH;MANOHARAN ASWATH;TOMKO MARK J.;EMERY GRANT M.;MOHAN VIJAI;TERRA EGIDIO
分类号 G06F7/00;G06F17/30 主分类号 G06F7/00
代理机构 代理人
主权项
地址