发明名称 EXTRACTION DEVICE FOR COMPOSITE GRAPH IN FIXED LAYOUT DOCUMENT AND EXTRACTION METHOD THEREOF
摘要 An extraction device for the composite graph in a fixed layout document comprising: a document parsing unit, for parsing the fixed layout document, and determining the primitives of the fixed layout document and their types; a layer generation unit, for extracting text primitives so as to form a text layer, and using the rest non-text primitives to form a non-text layer; a page analysis unit, for processing the text layer and the non-text layer with page analyses respectively; a block generation unit, for generating a text block in the text layer and a graph block in the non-text layer; a correlation block determination unit, for determining text blocks correlating to every graph block and merging those correlated text blocks and graph blocks into a composite graph block; an identifier storage unit, for storing the identifiers of all the primitives contained in the composite graph block.
申请公布号 US2015046784(A1) 申请公布日期 2015.02.12
申请号 US201314104064 申请日期 2013.12.12
申请人 PEKING UNIVERSITY FOUNDER GROUP CO., LTD. ;FOUNDER APABI TECHNOLOGY LIMITED ;PEKING UNIVERSITY 发明人 XU Canhui;Tang Zhi;Tao Xin;Shi Cao
分类号 G06F17/21 主分类号 G06F17/21
代理机构 代理人
主权项 1. An extraction device for the composite graph in a fixed layout document, the device comprising: a document parsing unit, for parsing the fixed layout document, and determining the primitives of the fixed layout document and types of said primitives; a layer generation unit, for extracting text primitives so as to form a text layer, and using the rest non-text primitives to form a non-text layer; a page analysis unit, for processing the text layer and the non-text layer with page analyses respectively; a block generation unit, for generating a text block in the text layer and a graph block in the non-text layer, based on the processing results of the page analyses conducted by the page analysis unit; a correlation block determination unit, for determining text blocks correlating to every graph block and merging those correlated text blocks and graph blocks into a composite graph block; an identifier storage unit, for storing the identifiers of all the primitives contained in the composite graph block.
地址 BEIJING CN