发明名称 METHOD AND DEVICE TO DETECT SIMILAR DOCUMENTS
摘要 A method for detecting similar documents includes extracting an entity from each of a first web document and a second web document; determining an importance contribution element corresponding to each of the web documents; calculating, using the processor, weights for each entity based on the determined importance contribution elements; and determining whether the web documents are similar documents based on the calculated weights. A device to detect similar documents includes a storage device; an entity extractor stored on the storage device and configured to extract an entity from a first web document and a second web document and to determine an importance contribution element from each of the web documents; a weight calculator configured to calculate weights of each entity based on the determined importance contribution elements; and a similar document detection unit configured to determine whether the web documents are similar documents based on the calculated weights.
申请公布号 US2012284270(A1) 申请公布日期 2012.11.08
申请号 US201213462592 申请日期 2012.05.02
申请人 LEE CHAE HYUN;SIM DONG YUN;NHN CORPORATION 发明人 LEE CHAE HYUN;SIM DONG YUN
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址