发明名称 |
METHOD AND DEVICE TO DETECT SIMILAR DOCUMENTS |
摘要 |
A method for detecting similar documents includes extracting an entity from each of a first web document and a second web document; determining an importance contribution element corresponding to each of the web documents; calculating, using the processor, weights for each entity based on the determined importance contribution elements; and determining whether the web documents are similar documents based on the calculated weights. A device to detect similar documents includes a storage device; an entity extractor stored on the storage device and configured to extract an entity from a first web document and a second web document and to determine an importance contribution element from each of the web documents; a weight calculator configured to calculate weights of each entity based on the determined importance contribution elements; and a similar document detection unit configured to determine whether the web documents are similar documents based on the calculated weights.
|
申请公布号 |
US2012284270(A1) |
申请公布日期 |
2012.11.08 |
申请号 |
US201213462592 |
申请日期 |
2012.05.02 |
申请人 |
LEE CHAE HYUN;SIM DONG YUN;NHN CORPORATION |
发明人 |
LEE CHAE HYUN;SIM DONG YUN |
分类号 |
G06F17/30 |
主分类号 |
G06F17/30 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|