摘要 |
A document analysis device according to the present invention addresses the problem of extracting an item character row correctly even from a form in which the frame structure is not clearly shown, such as an English form, or a form in which the character interval is so large that the extraction of an item character row by a conventional technique fails, or in which, because a character row has a line break half-way through and is moved to a separated position, the extraction of an item character row fails and, in addition, the extraction of the item character row and a value character row are affected. A document processing device provided with an input device, a processor connected to the input device, a storage device connected to the processor, and an output device connected to the processor is characterized in that the processor comprises: a means for extracting, with respect to each form document input via the input device, a junction candidate relationship between characters on the basis of the location of the characters, a junction relationship having a high likelihood of being an item name on the basis of the extracted character-to-character junction relationship, and an item name character string region candidate on the basis of the extracted junction relationship; and a means for determining whether the item name character string region candidate is an item name character string. |