发明名称 SYSTEM AND METHOD FOR STORING AND SEARCHING DATA EXTRACTED FROM TEXT DOCUMENTS
摘要 Disclosed are system and method for storing, searching and updating extracted data for natural language processing of text. An example method comprises extracting at least one first information object from a text document; generating one or more subject-predicate-object triplets for the first information object; accessing a storage of extracted data that contains a RDF graph comprising a plurality of subject-predicate-object triplets for a plurality of different information objects; searching the storage of extracted data for a second information object related to the first information object, wherein searching includes selecting and searching at least one of three types of N-gram identifier tables containing one of a double, a triple and a quad search indices associated with at least two of a subject, a predicate, an object and a document; when at least one second information object related to the first information object is found, wherein two objects are related when said two objects have at least one of a subject, a predicate and an object in common, updating the storage of extracted data by adding the at least one subject-predicate-object triplet of the first information object to the master RDF graph and associating the first and second information objects with each other.
申请公布号 US2016275180(A1) 申请公布日期 2016.09.22
申请号 US201514717647 申请日期 2015.05.20
申请人 ABBYY InfoPoisk LLC 发明人 Matskevich Stepan
分类号 G06F17/30;G06F17/24 主分类号 G06F17/30
代理机构 代理人
主权项 1. A computer-implemented method for storing in a computer system, searching and updating data extracted from text documents, the method comprising: extracting at least one first information object from a text document; generating one or more subject-predicate-object triplets for the first information object; accessing a storage of extracted data that contains a RDF graph comprising a plurality of subject-predicate-object triplets for a plurality of different information objects extracted from different text documents; searching the storage of extracted data for a second information object related to the same object in real world as the first information object, wherein two information objects are related when said two information objects have at least the subject parameter in common, and wherein searching includes selecting and searching at least one of three types of identifier tables containing one of a double, a triple and a quad search indices, wherein each search index is based on at least two parameters selected from a subject, a predicate, an object and a document; when at least one second information object related to the same object in real world as the first information object is found, updating the storage of extracted data by adding the at least one subject-predicate-object triplet of the first information object to the RDF graph and updating at least one of the three types of indexes tables.
地址 Moscow RU