发明名称
摘要 <p><P>PROBLEM TO BE SOLVED: To correctly extract meta data which does not have any explicit item even from a document printed in the formats of 2-nup and 4-nup. <P>SOLUTION: Prepared are a meta data word dictionary in which a correspondence relation between words and meta data identifiers is described; a complex item dictionary in which a correspondence relation between the combination of a plurality of words and the meta data identifiers is described; a meta data characteristic dictionary in which a characteristic list which meta data seem to have is described; a meta data ontology in which the parallelism or hierarchy of the meta data is described; and a processing target meta data designation dictionary in which extraction target meta data are designated. A first method includes determining a direction where corresponding data exist based on the alignment of charts or lines constituted of ruled lines and the layout of sessions with an item line as a base point, and extracting meta data lines. A second method includes calculating the scores of the meta data likeliness of each line in a document based on the meta data feature dictionary, and extending the region of the meta data in an appropriate range with a line whose scores are high as a base point. <P>COPYRIGHT: (C)2010,JPO&INPIT</p>
申请公布号 JP5380040(B2) 申请公布日期 2014.01.08
申请号 JP20080279070 申请日期 2008.10.30
申请人 发明人
分类号 G06F17/30;G06F17/21;G06T1/00 主分类号 G06F17/30
代理机构 代理人
主权项
地址