摘要 |
A method for establishing a relationship between a text image and a transcription associated with the text image uses conventional image processing techniques to identify one or more geometric attributes, or image parameters, of each of a sequence of regions of the text image. The transcription labels in the transcription are analyzed to determine a comparable set of parameters in transcription label sequence. A matching operation then matches the respective parameters of the two sequences to identify image regions that match with transcription regions. The result is an output data structure that minimally identifies image locations of interest to a subsequent operation that processes the text image. The output data structure may also pair each of the image locations of interest to a transcription location, in effect producing a set of labeled image locations. In one embodiment, the sequence of locations of words and their observed lengths in the text image are determined. The transcription is analyzed to identify words, and transcription word lengths are computed using an estimated image character width of glyphs in the text image. The sequence of observed image word lengths is then matched to the sequence of computed transcription word lengths using a dynamic programming algorithm that finds a best path through a two-dimensional lattice of nodes and transitions between nodes, where the transitions represent pairs of sequences of zero or more word lengths. An output data structure contains entries, each of which pairs a transcription word with a matching image word location.
|