发明名称 System and method for transcribing handwritten records using word grouping with assigned centroids
摘要 A handwriting recognition system converts word images on documents, such as document images of historical records, into computer searchable text. Word images (snippets) on the document are located, and have multiple word features identified. For each word image, a word feature vector is created representing multiple word features. Based on the similarity of word features (e.g., the distance between feature vectors), similar words are grouped together in clusters, and a centroid that has features most representative of words in the cluster is selected. A digitized text word is selected for each cluster based on review of a centroid in the cluster, and is assigned to all words in that cluster and is used as computer searchable text for those word images where they appear in documents. An analyst may review clusters to permit refinement of the parameters used for grouping words in clusters, including the adjustment of weights and other factors used for determining the distance between feature vectors.
申请公布号 US9619702(B2) 申请公布日期 2017.04.11
申请号 US201514841542 申请日期 2015.08.31
申请人 Ancestry.com Operations Inc. 发明人 Reese Jack;Murdock Michael;Reid Shawn;Brown Laryn
分类号 G06K9/00;G06K9/18;G06K9/52;G06K9/62 主分类号 G06K9/00
代理机构 Kilpatrick Townsend & Stockton LLP 代理人 Kilpatrick Townsend & Stockton LLP
主权项 1. A method for creating digitized text for a record from an image of the record, comprising: obtaining a digital image of a record; evaluating the record image in order to locate each of multiple word images; for each located word image, identifying multiple word features of that word image; assigning each of the multiple word images that have similar word features to one of a plurality of word clusters; selecting a representative word image in each of the word clusters as a centroid; reviewing, by an analyst, the centroid in each of the word clusters, and entering digitized text for the centroid; and assigning the digitized text for the centroid to all other word images in the same word cluster as the centroid.
地址 Lehi UT US