发明名称 OBJECT EXTRACTION IN COLOUR COMPOUND DOCUMENTS
摘要 Disclosed is a computer implemented method of text extraction in colour compound documents. The method connects similarly coloured pixels of an image of a colour compound document into connected components (CCs); classifies each CC as either text or non-text; refines the text CC classification for each text CC using global colour context statistics; groups text CCs into text blocks; recovers misclassified non-text CCs into a nearby text block; and removes extraneous CCs from each text block using local colour context statistics to thereby provide the extracted text in the text blocks. Also disclosed is a computer implemented method of locating graphics objects in a colour compound document image. The method connects similarly coloured pixels of said image into connected components (CCs) and placing the CCs in an enclosure tree; classifies (330,730) each CC into one of a plurality of classes wherein at least one class (862) represents salient graphics components; identifies (1140) a graphics container (441) to perform semantic analysis for each CC of said class representing salient graphics components; profiles (1170) descendents of said graphics container in said tree to obtain semantic context statistics; and decides (1710) whether the graphics container contains a whole or part of a graphics object based on said semantic context statistics.
申请公布号 US2010157340(A1) 申请公布日期 2010.06.24
申请号 US20090637446 申请日期 2009.12.14
申请人 CANON KABUSHIKI KAISHA 发明人 CHEN YU-LING;LIU PING;MCDONELL TREVOR LEE
分类号 G06F15/00;G06K9/00 主分类号 G06F15/00
代理机构 代理人
主权项
地址