发明名称 Content Delineation in Document Images
摘要 Methods and apparatus delineate grouped together content in documents. Void and unvoid pixels in document images get clustered together. Execution of a histogram and autocorrelation function, including peak detection, against the unvoid clusters reveals the content. Techniques for clustering include iteratively transforming an original image into secondary images with a Haar wavelet transformation, for example. Clustering begins on a lowest image plane and advances to a next highest plane until all void and unvoid pixels in the images are grouped. Void clusters at lower levels remain void clusters at higher levels, thus only unvoid clusters of pixels require processing at higher levels thereby optimizing processing. Imaging devices with scanners define suitable hardware for transformation of the document into images and processors with executable code cluster together pixels to delineate content. Further processing includes executing OCR or other routines post void/unvoid analysis.
申请公布号 US2017053163(A1) 申请公布日期 2017.02.23
申请号 US201514827725 申请日期 2015.08.17
申请人 Lexmark International, Inc. 发明人 Meier Ralph;Hausmann Johannes;Urbschat Harry;Wanschura Thorsten
分类号 G06K9/00;G06K9/62;G06K9/52;G06K9/46 主分类号 G06K9/00
代理机构 代理人
主权项 1. A method of identifying content in a document, comprising: receiving at a processor of a computing device an image corresponding to the document; transforming the image into one or more secondary images each having pluralities of pixels; determining void and unvoid pixels in the one or more secondary images and grouping together pluralities of the void pixels and unvoid pixels to form void clusters and unvoid clusters; determining a histogram for the unvoid clusters; and executing an autocorrelation function relative to the histogram and detecting relative peaks thereof, thereby delineating grouped together content in the document.
地址 Lexington KY US