发明名称 |
Content-based document image classification |
摘要 |
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for classifying one or more document images based on its content by determining blocks layout of the document image; recognizing the document image to obtain digital content data representing text content or the potential graphical content of the image; calculating feature values of the document image for features based on the digital content data and the blocks layout; and classifying the document image as belonging to one of document classes based on the calculated feature values. |
申请公布号 |
US9626555(B2) |
申请公布日期 |
2017.04.18 |
申请号 |
US201414571766 |
申请日期 |
2014.12.16 |
申请人 |
ABBYY DEVELOPMENT LLC |
发明人 |
Smirnov Anatoly;Panferov Vasily;Isaev Andrey |
分类号 |
G06K9/00 |
主分类号 |
G06K9/00 |
代理机构 |
Lowenstein Sandler LLP |
代理人 |
Lowenstein Sandler LLP |
主权项 |
1. A method for classifying a document image based on its content using a processor device, comprising:
accessing a set of features stored in memory; analyzing the document image to determine blocks layout; recognizing the document image to obtain digital content data representing text content or potential graphical content; calculating, based on one or more features from the set of features accessed in the memory, feature values of the document image for the one or more features from the set of features, wherein the feature values are based on the digital content data and the blocks layout; and classifying the document image as belonging to a document class from a set of document classes based on the calculated feature values. |
地址 |
Moscow RU |