发明名称 DOCUMENT IMAGE PROCESSING APPARATUS, DOCUMENT IMAGE PROCESSING METHOD, DOCUMENT IMAGE PROCESSING PROGRAM, AND RECORDING MEDIUM ON WHICH DOCUMENT IMAGE PROCESSING PROGRAM IS RECORDED
摘要 An image of a character string composed of M pieces of characters is clipped from a document image, and the image is divided into separate characters. Image features of each character image are extracted. Based on the image features, N (N>1, integer) pieces of character images in descending order of degree of similarity are selected as candidate characters, from a character image feature dictionary which stores the image features of character image in units of character, and a first index matrix of MxN cells is prepared. A candidate character string composed of a plurality of candidate characters constituting a first column of the first index matrix, is subjected to a lexical analysis according to a language model, and whereby a second index matrix having a character string which makes sense is prepared. In the language model, statistics are taken and then, the lexical analysis is performed.
申请公布号 US2009028446(A1) 申请公布日期 2009.01.29
申请号 US20080972446 申请日期 2008.01.10
申请人 WU BO;DOU JIANJUN;LE NING;WU YADONG;JIA JING 发明人 WU BO;DOU JIANJUN;LE NING;WU YADONG;JIA JING
分类号 G06K9/72 主分类号 G06K9/72
代理机构 代理人
主权项
地址