发明名称 FINDING MULTIPLE FIELD GROUPINGS IN SEMI-STRUCTURED DOCUMENTS
摘要 A method is provided for parsing a semi-structured document having a plurality of document lines on which a series of items are listed, the listing of each item spanning one or more document lines. The method includes: obtaining a plurality of candidate records, each candidate record spanning one or more lines of the document; defining a term representing an optimal cost of selecting a number n of candidate records to span the document lines up to a given ending document line i; efficiently evaluating the term over a first range of values for n and a second range of values for i; and selecting a subset of the plurality of candidate records as a global optimal parse of the document, wherein the subset selected is based on the evaluation of the defined term.
申请公布号 US2014281938(A1) 申请公布日期 2014.09.18
申请号 US201313799289 申请日期 2013.03.13
申请人 PALO ALTO RESEARCH CENTER INCORPORATED 发明人 Pavlopoulou Christina
分类号 G06F17/24 主分类号 G06F17/24
代理机构 代理人
主权项 1. A method for parsing a semi-structured document having a plurality of document lines on which a series of items are listed, the listing of each item spanning one or more document lines, said method comprising: obtaining a plurality of candidate records, each candidate record spanning one or more lines of the document; defining a term representing an optimal cost of selecting a number n of candidate records to span the document lines up to a given ending document line i; efficiently evaluating the term over a first range of values for n and a second range of values for i; and selecting a subset of the plurality of candidate records as a global optimal parse of the document, wherein the subset selected is based on the evaluation of the term.
地址 Palo Alto CA US
您可能感兴趣的专利