发明名称 DOCUMENT STRUCTURE IDENTIFIER
摘要 A method of automated document structure identification based on visual cues is disclosed herein. The two dimensional layout of the document is analyzed to discern visual cues related to the structure of the document, and the text of the document is tokenized so that similarly structured elements are treated similarly. The method can be applied in the generation of extensible mark-up language files, natural language parsing and search engine ranking mechanisms.
申请公布号 WO03098370(A2) 申请公布日期 2003.11.27
申请号 WO2003CA00729 申请日期 2003.05.20
申请人 TATA INFOTECH LTD.;SLOCOMBE, DAVID 发明人 SLOCOMBE, DAVID
分类号 G06F17/21;G06F17/22;G06F17/27;G06K9/20 主分类号 G06F17/21
代理机构 代理人
主权项
地址