发明名称 System and method for detecting equations
摘要 A system and method of extracting formulas in an electronic image of a document using optical character recognition (OCR) is disclosed. In one example, the method comprises analyzing the electronic image, including a plurality of text lines, to generate a plurality of bounding blocks, each bounding block associated with a text line detected within the electronic image, searching the plurality of text lines to detect at least one character matching one of a plurality of character groups, calculating a symbol density of each of the plurality of character groups for each of the plurality of text lines, and classifying each of the plurality of text lines as at least one of an equation block type, an inline equation block type, and a descriptive block type, based on the symbol density, wherein each of the plurality of text lines classified as the equation block type is extracted.
申请公布号 US8818033(B1) 申请公布日期 2014.08.26
申请号 US201213458654 申请日期 2012.04.27
申请人 Google Inc. 发明人 Liu Zongyi;Smith Raymond Wensley
分类号 G06K9/00 主分类号 G06K9/00
代理机构 Dority & Manning, P.A. 代理人 Dority & Manning, P.A.
主权项 1. A computer implemented method of extracting formulas in an electronic image of a document using optical character recognition (OCR), the method comprising: analyzing, by one or more computing devices, the electronic image, including a plurality of text lines, to generate a plurality of bounding blocks, each bounding block associated with a text line detected within the electronic image, wherein each of the one or more computing devices comprises one or more processors; searching, by the one or more computing devices, the plurality of text lines to detect at least one character matching one of a plurality of character groups; calculating, by the one or more computing devices, a symbol density of each of the plurality of character groups for each of the plurality of text lines; classifying, by the one or more computing devices, each of the plurality of text lines as at least one of an equation block type, an inline equation block type, and a descriptive block type, based on the symbol density; extracting, by the one or more computing devices, at least one bounding block associated with each of the plurality of text lines classified as the equation block type; and processing, by the one or more computing devices, a set of the plurality of text lines remaining using OCR to produce an output document.
地址 Mountain View CA US