发明名称 |
Identifying similarly formed paragraphs in scanned images |
摘要 |
A system and method for identifying and/or categorizing similarly formed paragraphs in a digital image is set forth. An exemplary system includes a processor and a memory. The memory stores executable components which when direct the system to perform the following: obtain at least one page image of reflowable textual content and identify at least one paragraph of textual content. Thereafter, for each identified paragraph, a plurality of paragraph metrics regarding the identified paragraph is determined. Based on the paragraph metrics, similarly formed paragraphs are clustered.
|
申请公布号 |
US7715635(B1) |
申请公布日期 |
2010.05.11 |
申请号 |
US20060540852 |
申请日期 |
2006.09.28 |
申请人 |
AMAZON TECHNOLOGIES, INC. |
发明人 |
SHAGAM JOSHUA;GOODWIN ROBERT L;BURNS JOHN C |
分类号 |
G06K9/62 |
主分类号 |
G06K9/62 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|