发明名称 |
DETERMINING NEAR DUPLICATE NOISY DATA OBJECTS |
摘要 |
<p>A system configured to find near duplicate documents. For each two (or more) documents that are similar to each other, the system is configured to identify which of the differences is likely to be generated by an Optical Character Recognition software or otherwise due to difference between the original documents. As a result, the process of identifying similarity between documents is improved by identifying documents that were originally exact duplicates but are different one with respect to the other only due to OCR errors, or correct the similarity level between the documents by correcting errors introduced by the OCR tool.</p> |
申请公布号 |
WO2007086059(A3) |
申请公布日期 |
2009.02.05 |
申请号 |
WO2007IL00095 |
申请日期 |
2007.01.25 |
申请人 |
EQUIVIO LTD.;RAVID, YIFTACH;MILO, AMIR |
发明人 |
RAVID, YIFTACH;MILO, AMIR |
分类号 |
G06F17/00 |
主分类号 |
G06F17/00 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|