发明名称 |
EXTRACTION DEVICE FOR COMPOSITE GRAPH IN FIXED LAYOUT DOCUMENT AND EXTRACTION METHOD THEREOF |
摘要 |
An extraction device for the composite graph in a fixed layout document comprising: a document parsing unit, for parsing the fixed layout document, and determining the primitives of the fixed layout document and their types; a layer generation unit, for extracting text primitives so as to form a text layer, and using the rest non-text primitives to form a non-text layer; a page analysis unit, for processing the text layer and the non-text layer with page analyses respectively; a block generation unit, for generating a text block in the text layer and a graph block in the non-text layer; a correlation block determination unit, for determining text blocks correlating to every graph block and merging those correlated text blocks and graph blocks into a composite graph block; an identifier storage unit, for storing the identifiers of all the primitives contained in the composite graph block. |
申请公布号 |
US2015046784(A1) |
申请公布日期 |
2015.02.12 |
申请号 |
US201314104064 |
申请日期 |
2013.12.12 |
申请人 |
PEKING UNIVERSITY FOUNDER GROUP CO., LTD. ;FOUNDER APABI TECHNOLOGY LIMITED ;PEKING UNIVERSITY |
发明人 |
XU Canhui;Tang Zhi;Tao Xin;Shi Cao |
分类号 |
G06F17/21 |
主分类号 |
G06F17/21 |
代理机构 |
|
代理人 |
|
主权项 |
1. An extraction device for the composite graph in a fixed layout document, the device comprising:
a document parsing unit, for parsing the fixed layout document, and determining the primitives of the fixed layout document and types of said primitives; a layer generation unit, for extracting text primitives so as to form a text layer, and using the rest non-text primitives to form a non-text layer; a page analysis unit, for processing the text layer and the non-text layer with page analyses respectively; a block generation unit, for generating a text block in the text layer and a graph block in the non-text layer, based on the processing results of the page analyses conducted by the page analysis unit; a correlation block determination unit, for determining text blocks correlating to every graph block and merging those correlated text blocks and graph blocks into a composite graph block; an identifier storage unit, for storing the identifiers of all the primitives contained in the composite graph block. |
地址 |
BEIJING CN |