发明名称 DOCUMENT SEGMENTATION
摘要 <p>A document to be segmented is converted into a common representation format, if necessary. Parsing of the document results in a document model that is analyzed based on at least one structure-dependent function to identify segments within the document. In one embodiment, the structure-dependent function may comprise a template, or a best-fit template of a plurality of templates, used for comparison with the document model. In other embodiments, the structure-dependent function may comprise table of contents information, font properties within the document model and/or an average segment size determined according to previously identified segments in one or more additional documents that are related to the document under consideration. Semantic-content dependent functions may be applied to further refine the analysis by identifying sub-segments within the extracted segments, or by identifying segments that may be properly merged according to the similarity of their respective semantic content.</p>
申请公布号 CA2698914(A1) 申请公布日期 2010.10.06
申请号 CA20102698914 申请日期 2010.04.06
申请人 ACCENTURE GLOBAL SERVICES GMBH 发明人 PRABHAKARA, JAGADEESH CHANDRA BOSE RANTHAM;RAO, SANGEETHA;CHANDRAN, ANITHA
分类号 G06F17/27 主分类号 G06F17/27
代理机构 代理人
主权项
地址