发明名称 Method for extracting, interpreting and standardizing tabular data from unstructured documents
摘要 A system, method, and computer program for automatically identifying, parsing, and interpreting tabular data from unstructured documents stored in various formats such as ASCII text, Unicode text, HTML, PDF text, and PDF image format is provided. A set of table identification, parsing/tokenizing, and interpreting/mapping rules are developed with grammar descriptors. These rules are then applied to a set of documents to identify a table, parse the content of the table, and interpret the parsed content, if required, thereby standardizing the tabular data.
申请公布号 US7590647(B2) 申请公布日期 2009.09.15
申请号 US20050140340 申请日期 2005.05.27
申请人 RAGE FRAMEWORKS, INC 发明人 SRINIVASAN VENKATESAN;KOTHIWALE MAHANTESH;ALAM RUMMANA;BHARADWAJ SRINIVASAN
分类号 G06F7/00;G06F3/00;G06F9/44 主分类号 G06F7/00
代理机构 代理人
主权项
地址