发明名称 METHOD AND APPARATUS FOR EFFICIENT IDENTIFICATION OF DUPLICATE AND NEAR-DUPLICATE DOCUMENTS AND TEXT SPANS USING HIGH-DISCRIMINABILITY TEXT FRAGMENTS
摘要 <p>Disclosed is a computer-assisted method for finding duplicate or near-duplicate documents or text spans within a document collection (#100) by using high-discriminability text fragments. Distinctive features of the documents or text spans are identified (#110). For each pair of documents or text spans with at least one distinctive feature in common, the distinctive features of each document or text span are compared to determine whether the pair is duplicates or near-duplicates (#114).</p>
申请公布号 WO2002041161(A1) 申请公布日期 2002.05.23
申请号 US2001048124 申请日期 2001.10.31
申请人 发明人
分类号 主分类号
代理机构 代理人
主权项
地址