发明名称 System and Method for Extracting Table Data from Text Documents Using Machine Learning
摘要 Systems and methods for extracting table data from text documents using machine learning are provided. The systems and methods comprise electronically receiving at a computer system a document having one or more tables, each table having one or more whitespace features, processing the document using a first computer model executed by the computer system to classify each row of the one or more tables as a header row or a data row, processing the document using a second computer model executed by the computer system to classify each whitespace feature in each row conditional on classification of each row by the first computer model, the second computer model identifying whether a whitespace feature corresponds to information missing from the one or more tables, and generating an output of the classified whitespace features and storing the output in a digital file.
申请公布号 US2016104077(A1) 申请公布日期 2016.04.14
申请号 US201514879349 申请日期 2015.10.09
申请人 The Trustees of Columbia University in the City of New York 发明人 Jackson, JR. Robert J.;Mitts Joshua R.;Zhang Jing
分类号 G06N99/00;G06F17/30 主分类号 G06N99/00
代理机构 代理人
主权项 1. A method for electronically extracting table data from text documents using machine learning, comprising: electronically receiving at a computer system a document having one or more tables, each table having one or more whitespace features; processing the document using a first computer model executed by the computer system to classify each row of the one or more tables as a header row or a data row; processing the document using a second computer model executed by the computer system to classify each whitespace feature in each row conditional on classification of each row by the first computer model, the second computer model identifying whether a whitespace feature corresponds to information missing from the one or more tables; and generating an output of the classified whitespace features and storing the output in a digital file.
地址 New York NY US