发明名称 |
EXTRACTING INFORMATION FROM STRUCTURED DOCUMENTS COMPRISING NATURAL LANGUAGE TEXT |
摘要 |
Systems and methods for extracting information from structured documents comprising natural language text. An example method comprises: receiving a table comprising a natural language text; identifying, within the table, a header and a plurality of cells organized into rows and columns; performing semantico-syntactic analysis of the natural language text to produce a plurality of semantic structures; interpreting the plurality of semantic structures using a first set of production rules to produce a data object representing the table; analyzing the header to identify a plurality of ontology classes associated with respective table columns; and modifying the data object representing the table using a second set of production rules associated with the ontology classes associated with the table columns. |
申请公布号 |
US2017052950(A1) |
申请公布日期 |
2017.02.23 |
申请号 |
US201514868715 |
申请日期 |
2015.09.29 |
申请人 |
ABBYY InfoPoisk LLC |
发明人 |
Danielyan Tatiana;Bulgakov Ilya |
分类号 |
G06F17/27;G06F17/24 |
主分类号 |
G06F17/27 |
代理机构 |
|
代理人 |
|
主权项 |
1. A method, comprising:
receiving, by a processing device, a table comprising a natural language text; identifying, within the table, a header and a plurality of cells organized into rows and columns; performing semantico-syntactic analysis of the natural language text to produce a plurality of semantic structures; interpreting the plurality of semantic structures using a first set of production rules to produce a data object representing the table; analyzing the header to identify a plurality of ontology classes associated with respective table columns; and modifying the data object representing the table using a second set of production rules associated with the ontology classes associated with the table columns. |
地址 |
Moscow RU |