发明名称 EXTRACTING INFORMATION FROM STRUCTURED DOCUMENTS COMPRISING NATURAL LANGUAGE TEXT
摘要 Systems and methods for extracting information from structured documents comprising natural language text. An example method comprises: receiving a table comprising a natural language text; identifying, within the table, a header and a plurality of cells organized into rows and columns; performing semantico-syntactic analysis of the natural language text to produce a plurality of semantic structures; interpreting the plurality of semantic structures using a first set of production rules to produce a data object representing the table; analyzing the header to identify a plurality of ontology classes associated with respective table columns; and modifying the data object representing the table using a second set of production rules associated with the ontology classes associated with the table columns.
申请公布号 US2017052950(A1) 申请公布日期 2017.02.23
申请号 US201514868715 申请日期 2015.09.29
申请人 ABBYY InfoPoisk LLC 发明人 Danielyan Tatiana;Bulgakov Ilya
分类号 G06F17/27;G06F17/24 主分类号 G06F17/27
代理机构 代理人
主权项 1. A method, comprising: receiving, by a processing device, a table comprising a natural language text; identifying, within the table, a header and a plurality of cells organized into rows and columns; performing semantico-syntactic analysis of the natural language text to produce a plurality of semantic structures; interpreting the plurality of semantic structures using a first set of production rules to produce a data object representing the table; analyzing the header to identify a plurality of ontology classes associated with respective table columns; and modifying the data object representing the table using a second set of production rules associated with the ontology classes associated with the table columns.
地址 Moscow RU