<p>Methods and systems for segmenting printed media pages into individual articles quickly and efficiently. A printed media based image that may include a variety of columns, headlines, images, and text is input into the system which comprises a block segmenter and a article segmenter system. The block segmenter identifies and produces blocks of textual content from a printed media image while the article segmenter system determines which blocks of textual content belong to one or more articles in the printed media image based on a classifier algorithm. A method for segmenting printed media pages into individual articles is also presented.</p>
申请公布号
WO2010019804(A3)
申请公布日期
2010.04.08
申请号
WO2009US53757
申请日期
2009.08.13
申请人
GOOGLE INC.;JAIN, ANKUR;SAHASRANAMAN, VIVEK;SAXENA, SHOBHIT;CHAUDHURY, KRISHNENDU