发明名称 Document image processing apparatus, document image processing method, and computer-readable recording medium having recorded document image processing program
摘要 A feature section including a feature of a candidate region but not including a feature of a related large region is set as for a style type different in feature from the related large region among a plurality of style types, with respect to each index candidate region. At least one or both of the large regions and the candidate regions having the feature included in the set feature section are grouped. An index evaluation degree is calculated, based on the grouped result, with respect to each candidate region. It is determined whether or not a logical element of each candidate region is an index, based on the calculated index evaluation degree.
申请公布号 US8837818(B2) 申请公布日期 2014.09.16
申请号 US201012722057 申请日期 2010.03.11
申请人 Konica Minolta Business Technologies, Inc. 发明人 Komaki Yoshio
分类号 G06F17/21 主分类号 G06F17/21
代理机构 Cantor Colburn LLP 代理人 Cantor Colburn LLP
主权项 1. A document image processing apparatus for use with a document image comprising a plurality of character string element regions, the document image processing apparatus comprising: a memory for storing the document image; and a controller for controlling extraction of an index region from said document image, wherein said controller is configured to: for each character string element region, identify the character string element region as a large region or a small region based on at least one of a size property and a character property of the character string element region;for each small region, identify the small region as an index candidate when a region immediately following the small region is a large region, and identify the large region immediately following the index candidate region as a related text region;for each index candidate region, compare formatting of the index candidate region to formatting of the related text region and identify a formatting property that is different between the index candidate region and the related text region as a different formatting property;for each index candidate region, set an evaluation criterion such that a value of the index candidate region for the different formatting property satisfies the evaluation criterion and a value of the related text region for the different formatting property does not satisfy the evaluation criterion;for each index candidate region: set the index candidate region as the focused index candidate region and set the different formatting property of the focused index candidate region as the focused different formatting property;calculate at least one of a number of similar index candidate regions and a number of similar large regions, wherein: the number of similar index candidate regions is a total number of index candidate regions among the index candidate regions, except for the focused index candidate region, that satisfy the evaluation criterion for the focused different formatting property; andthe number of similar large regions is a total number of large regions that satisfy the evaluation criterion for the focused different formatting property;calculate an index evaluation degree based on at least one of the number of similar index candidate regions and the number of similar large regions;identify index regions from among the index candidate regions based on the index evaluation degree of each index candidate region.
地址 JP