发明名称 AUTOMATED EXTRACTION OF BIO-ENTITY RELATIONSHIPS FROM LITERATURE
摘要 Automated, standardized and accurate extraction of relationships within text. Automatic extraction of such relationships/information allows the information to be stored in structured form so that it can be easily and accurately retrieved when needed. Such information can be used to build online search engines for highly specific and accurate information retrieval. The current invention discloses a novel approach to extract such information from raw text based on natural language processing (NLP) and graph theoretic algorithm. The novel method can be applied, for example, to extract protein-protein relationships in biomedical literature. The method can be easily extended to extract other biological relationships between biological terms such as proteins, genes, pathways, diseases and drugs. The method can also be applied to other information domains to extract other relationships.
申请公布号 US2015066483(A1) 申请公布日期 2015.03.05
申请号 US201414534777 申请日期 2014.11.06
申请人 The Florida State University Research Foundation, Inc. 发明人 Zhang Jinfeng
分类号 G06F19/24;G06F17/27 主分类号 G06F19/24
代理机构 代理人
主权项 1. One or more non-transitory, tangible computer-readable media having computer-executable instructions for performing a method by running a software program on a computer, the computer operating under an operating system, the method including issuing instructions from the software program to extract semantic textual relationships or patterns from non-annotated data by natural language processing and graph theoretic algorithm, the instructions comprising: receiving a plurality of known textual strings and a plurality of interaction word strings; receiving annotated text as training data that contains true and false patterns; automatically building a decision support tool based on said true and false patterns to which said non-annotated data can be parsed, said decision support tool including at least a first level and a second level, said first level having a first decision node, said second level having a second decision node, said first and second decision nodes each associated with at least a portion of said true and false patterns; receiving said non-annotated data; extracting a textual clause of said non-annotated data that contains non-triplet word strings and at least one triplet, said at least one triplet including a first textual entity, a second textual entity, and an interaction word, wherein said interaction word indicates a possible relationship between said first textual entity and said second textual entity; automatically parsing said extracted textual clause through said decision support tool to obtain a plurality of components based on dependencies among said plurality of components; extracting said at least one triplet from said plurality of components by attempting to match said plurality of components of said parsed, extracted textual clause to said first level of said decision support tool; identifying extraction of said at least one triplet as true if said plurality of components matches said first level of said decision support tool; identifying extraction of said at least one triplet as false if said plurality of components fails to match said first level of said decision support tool; as a result of said plurality of components failing to match said first level of said decision support tool, extracting said at least one triplet from said plurality of components by attempting to match said plurality of components to said second level of, aid decision support tool; identifying extraction of said at least one triplet as true if said plurality of components matches said second level of said decision support tool, said second level of said decision support tool being a simplified pattern of said first level of said decision support tool to capture textual clauses that are not identical to said extracted textual clause; and identifying extraction of said at least one triplet as false if said plurality of components fails to match said second level of said decision support tool.
地址 Tallahassee FL US