发明名称 Method and apparatus for detecting pagination constructs including a header and a footer in legacy documents
摘要 <p>A method for identifying header/footer content of a document, in order to sequence text fragments comprising recognizable text blocks as derived from the document. The textual variability of lines comprised of text blocks, including the different kinds of text blocks within the line is analyzed for assessment of textual variability. Header/footer zones are defined by textual content having a low textual variability. An alternative embodiment identifies pagination constructs by comparing selected text-boxes for similarity and proximity and clustering the text boxes satisfying a predetermined similarity value, wherein the clustered text boxes are deemed to comprise pagination constructs.</p>
申请公布号 EP1679613(A2) 申请公布日期 2006.07.12
申请号 EP20060100200 申请日期 2006.01.10
申请人 XEROX CORPORATION 发明人 DEJEAN, HERVE;MEUNIER, JEAN-LUC
分类号 G06F17/21 主分类号 G06F17/21
代理机构 代理人
主权项
地址
您可能感兴趣的专利