摘要 |
PURPOSE:To easily recognize a horizontal write character string, a vertical write character string, and other area than horizontal write/vertical write from a document in which horizontal write and vertical write are mixed by separating an area having different layout attributes by connecting pixels having the same layout attribute, and extracting an area having the same layout attribute. CONSTITUTION:When a distance to other representative coordinate value is less than a value obtained by adding a standard deviation to a mode value with respect to each of the representative coordinate values, an area connecting means connects the pixels of an area having two representative coordinate value. Subsequently, with respect to this image in which the pixels are connected, a layout recognizing part executes a boundary tracking processing, and from a rectangular area extracted thereby, a layout attribute for which each rectangular area has is recognized. In such a manner, even from a document in which horizontal write and vertical write are mixed, the respective layout attributes whose attributes are different of a horizontal write character string, a vertical write character string, other area than horizontal write and vertical write, etc., can be recognized and identified without requiring human intervention. |