摘要 |
<P>PROBLEM TO BE SOLVED: To provide a document analysis program capable of accurately extracting a document layout structure of an electronic document, a computer-readable storage medium storing the document layout analysis program, a document layout analysis method, and a document layout analysis device. <P>SOLUTION: Coordinate information about respective characters in a document image is acquired, a character string in the document image is detected based on the acquired coordinate information, and characters included in the detected character string are selected one by one. In a rectangular inspection area taking a predetermined angle of a circumscribing rectangle, which surrounds the character string, as one angle and including the circumscribing rectangle surrounding the selected characters, a character string is set by numbering the respective characters so that a character with a larger number than that for the selected characters is not included and adding the characters one by one according to the given number order. In the rectangular inspection area containing the characters already added to the character string and a newly added character, if a character other than the characters already added and the newly added character is contained, the newly added character is removed while the already added characters are combined together to be set again as one sentence. <P>COPYRIGHT: (C)2005,JPO&NCIPI |