发明名称 Hierarchical alignment of character sequences representing text of same source
摘要 Systems and methods for character-by-character alignment of two character sequences (such as OCR output from a scanned document and an electronic version of the same document) using a Hidden Markov Model (HMM) in a hierarchical fashion are disclosed. The method may include aligning two character sequences utilizing multiple hierarchical levels. For each hierarchical level above a final hierarchical level, the aligning may include parsing character subsequences from the two character sequences, performing an alignment of the character subsequences, and designating aligned character subsequences as the anchors, the parsing and performing the alignment being between the anchors generated from an immediately previous hierarchical level if the current hierarchical level is below the first hierarchical level. For the final hierarchical level, the aligning includes performing a character-by-character alignment of characters between anchors generated from the immediately previous hierarchical level. At each hierarchical level, an HMM may be constructed and Viterbi algorithm may be employed to solve for the alignment.
申请公布号 US8170289(B1) 申请公布日期 2012.05.01
申请号 US20050232476 申请日期 2005.09.21
申请人 FENG SHAOLEI;MANMATHA RAGHAVAN;GOOGLE INC. 发明人 FENG SHAOLEI;MANMATHA RAGHAVAN
分类号 G06K9/00 主分类号 G06K9/00
代理机构 代理人
主权项
地址