发明名称 Image-domain script and language identification
摘要 Disclosed herein is a method, computer system and computer program product for identifying a writing system associated with a document image containing one or more words written in the writing system. Initially, a document image fragment is identified based on the document image, wherein the document image fragment contains one or more pixels from one or more of the words in the document image. A set of sequential features associated with the document image fragment is generated, wherein each sequential feature describes one dimensional graphic information derived from the one or more pixels in the document image fragment. A classification score for the document image fragment is generated responsive at least in part to the set of sequential features, the classification score indicating a likelihood that the document image fragment is written in the writing system. The writing system associated with the document image is identified based at least in part on the classification score for the document image fragment.
申请公布号 US8233726(B1) 申请公布日期 2012.07.31
申请号 US20070945978 申请日期 2007.11.27
申请人 POPAT ASHOK;BREVDO EUGENE;GOOGE INC. 发明人 POPAT ASHOK;BREVDO EUGENE
分类号 G06K9/72;G06K9/38 主分类号 G06K9/72
代理机构 代理人
主权项
地址