发明名称 AUTOMATED DOCUMENT RECOGNITION, IDENTIFICATION, AND DATA EXTRACTION
摘要 A method for automated document recognition, identification, and data extraction is described herein. The method comprises receiving, by the processor, an image of a document associated with a user. The image is analyzed using optical character recognition to obtain image data, wherein the image data includes text zones. Based on the image data, the image is compared to one or more document templates. Based on the comparison, a document template having the highest degree of coincidence with the image is determined. The text zones of the image are associated with text zones of the document template to determine a type of data in each text zone. The data is structured into a standard format to obtain structured data.
申请公布号 US2015078671(A1) 申请公布日期 2015.03.19
申请号 US201414468173 申请日期 2014.08.25
申请人 IDChecker, Inc. 发明人 van Deventer Jorgen;Hagen Michael;Mandak Istvan
分类号 G06K9/00;G06F17/21 主分类号 G06K9/00
代理机构 代理人
主权项 1. A processor-implemented method for automated document recognition, identification and data extraction, the method comprising: receiving a video stream associated with the document, the document being associated with a user; detecting an image of the document in the video stream, the detecting including recognizing a shape corresponding to the document overall; improving the detected image of the document in the video stream by adjusting colors, adjusting brightness, and removing blurring; extracting the detected image of the document from the video stream, the image being a still image; analyzing the extracted image using optical character recognition to produce image data, the image data including text zones, each of the text zones being associated with one or more distances to other text zones and one or more borders of the document, the one or more distances being determined using coordinates; comparing the extracted image to one or more document templates using the image data; determining a document template having a highest degree of coincidence with the extracted image using the comparison; matching the text zones of the extracted image with text zones of the document template to determine a type of data in each text zone; and structuring the data into a standard format to obtain structured data.
地址 San Francisco CA US