发明名称 SYSTEMS AND METHODS FOR EXTRACTING INFORMATION FROM STRUCTURED DOCUMENTS
摘要 Systems and methods for extracting information from structured documents are provided. The systems and methods relate to selecting a centroid document from a group of structured documents, selecting a subset of the group of structured documents in order to form a cluster of the subset of documents about the centroid document. The selecting the subset is preferably based on the relative similarity between each of the selected subset and the centroid document. Then, systems and methods according to the invention include marking a data element on the centroid document. The systems and elements also include identifying a data element on each of the subset of documents, the data element that corresponds to the marked data element on the centroid document. Finally, data may be extracted from the subset of documents based on the identifying step.
申请公布号 US2012101979(A1) 申请公布日期 2012.04.26
申请号 US201113340236 申请日期 2011.12.29
申请人 ASHKENAZI AMIR;GLICKMAN OREN;YEAR ARIEL;SHOPPING.COM 发明人 ASHKENAZI AMIR;GLICKMAN OREN;YEAR ARIEL
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址