发明名称 DEVICE AND METHOD FOR ANALYZING TABLE STRUCTURE
摘要 PROBLEM TO BE SOLVED: To stably extract item names-data relation even when an item name word is not known in advance, and an item name dictionary cannot be fully provided in technology for extracting the item name-data relation from unspecified and a large amount of documents.SOLUTION: Form features and character string features of all pairs of adjacent frames in a table are referred to, and a differential score showing difference among them is set to a contact of the pair of frames. Next, to all rule grids in the table, the differential score set to a frame contact belonging to the rule grids is projected (such as taking the sum, taking the average), and an item name-data boundary score is calculated. The item name-data boundary score is a certainty factor representing whether or not the rule grids are ruled lines at the boundaries between an item name frame and a data frame, and set based on a policy that the contact at which difference appears in frame features is the boundary between the item name frame and the data frame. Next, the item name frame in the table is determined from a position of the item name-data boundary, and the item name-data relation is determined based on adjacent relation with other frames.
申请公布号 JP2013190993(A) 申请公布日期 2013.09.26
申请号 JP20120056656 申请日期 2012.03.14
申请人 HITACHI LTD 发明人 HIRAYAMA JUNICHI;FUJIO MASAKAZU;KOBAYASHI YOSHIYUKI;MACHII KIMIYOSHI;KAWABATA KAORU
分类号 G06K9/00;G06K9/20;G06K9/68;G06T7/60 主分类号 G06K9/00
代理机构 代理人
主权项
地址