发明名称 DOCUMENT PROCESSING DEVICE AND ITEM EXTRACTION METHOD
摘要 A document analysis device according to the present invention addresses the problem of extracting an item character row correctly even from a form in which the frame structure is not clearly shown, such as an English form, or a form in which the character interval is so large that the extraction of an item character row by a conventional technique fails, or in which, because a character row has a line break half-way through and is moved to a separated position, the extraction of an item character row fails and, in addition, the extraction of the item character row and a value character row are affected. A document processing device provided with an input device, a processor connected to the input device, a storage device connected to the processor, and an output device connected to the processor is characterized in that the processor comprises: a means for extracting, with respect to each form document input via the input device, a junction candidate relationship between characters on the basis of the location of the characters, a junction relationship having a high likelihood of being an item name on the basis of the extracted character-to-character junction relationship, and an item name character string region candidate on the basis of the extracted junction relationship; and a means for determining whether the item name character string region candidate is an item name character string.
申请公布号 WO2016046988(A1) 申请公布日期 2016.03.31
申请号 WO2014JP75744 申请日期 2014.09.26
申请人 HITACHI, LTD. 发明人 FUJIO MASAKAZU
分类号 G06K9/72 主分类号 G06K9/72
代理机构 代理人
主权项
地址