发明名称 DOCUMENT PICTURE STRUCTURE ANALYSIS METHOD
摘要 PROBLEM TO BE SOLVED: To precisely and efficiently analyze the structure of a document picture by using content information when the document picture is converted into an electronized document. SOLUTION: For learning the document structure of a whole document, the document picture of a content page is taken in at first, it is extracted in a basic rectangle at every line, a character is recognized and is analyzed. Here, chapter/clause numbers are analyzed, indexes are extracted and the page numbers of the respective indexes are extracted. The document picture of the text page is taken in, several tens of continuous pages are inputted and the basic rectangle is extracted and analyzed against the respective pages. The layout elements of a header, a footer, the page number, chapter/clause, a text, graphic/ table are identified from the layout feature of the extracted basic rectangle. All the elements except for the rectangles identified as the graphic/list are character-recognized. The index is matched with an index candidate extracted in text analysis at every index page extracted in content analysis as a matching processing and more precise index information is analyzed.
申请公布号 JPH11232439(A) 申请公布日期 1999.08.27
申请号 JP19980050130 申请日期 1998.02.16
申请人 HAYASHI TOSHINARI 发明人 HAYASHI TOSHINARI
分类号 G06F17/21;G06F17/30;G06K9/20;G06T1/00;G06T7/00;G06T7/40 主分类号 G06F17/21
代理机构 代理人
主权项
地址