发明名称 Search and retrieval of documents indexed by optical character recognition
摘要 An image of a character string composed of M pieces of characters is clipped from a document image, and the image is divided into separate characters. Image features of each character image are extracted. Based on the image features, N (N>1, integer) pieces of character images in descending order of degree of similarity are selected as candidate characters, from a character image feature dictionary which stores the image features of character image in units of character, and a first index matrix of M×N cells is prepared. A candidate character string composed of a plurality of candidate characters constituting a first column of the first index matrix, is subjected to a lexical analysis according to a language model, and whereby a second index matrix having a character string which makes sense is prepared. In the language model, statistics are taken and then, the lexical analysis is performed.
申请公布号 US8208765(B2) 申请公布日期 2012.06.26
申请号 US20080972446 申请日期 2008.01.10
申请人 WU BO;DOU JIANJUN;LE NING;WU YADONG;JIA JING;SHARP KABUSHIKI KAISHA 发明人 WU BO;DOU JIANJUN;LE NING;WU YADONG;JIA JING
分类号 G06K9/00 主分类号 G06K9/00
代理机构 代理人
主权项
地址
您可能感兴趣的专利