发明名称 Natural language processing system and method
摘要 A natural language processing system is disclosed herein. Embodiments of the NLP system perform hand-written rule-based operations that do not rely on a trained corpus. Rules can be added or modified at any time to improve accuracy of the system, and to allow the same system to operate on unstructured plain text from many disparate contexts (e.g. articles as well as twitter contexts as well as medical articles) without harming accuracy for any one context. Embodiments also include a language decoder (LD) that generates information which is stored in a three-level framework (word, clause, phrase). The LD output is easily leveraged by various software applications to analyze large quantities of text from any source in a more sophisticated and flexible manner than previously possible. A query language (LDQL) for information extraction from NLP parsers' output is disclosed, with emphasis on its embodiment implemented for LD. It is also presented, how to use LDQL for knowledge extraction on the example of application named Knowledge Browser.
申请公布号 US9152623(B2) 申请公布日期 2015.10.06
申请号 US201314071631 申请日期 2013.11.04
申请人 Fido Labs, Inc. 发明人 Wroczyński Michal;Krupa Tomasz;Leliwa Gniewosz;Wiacek Piotr;Stańczyk Michal
分类号 G06F17/28 主分类号 G06F17/28
代理机构 Courtney IP Law 代理人 Courtney IP Law ;Courtney Barbara B.
主权项 1. A system for natural language processing comprising: a processor configured to execute a natural language processing method, the method comprising, the processor receiving input data from one or more data sources, wherein the input data comprises one or more of plain text, and tokenized text;the processor tokenizing the text;the processor aggregating tokens into a three-level structure comprising a word level, a phrase level, and a clause level wherein, tokens are aggregated into words, words are aggregated into phrases, and phrases are aggregated into clauses;each element on a higher level comprises one or more elements of a lower level;tokens that are one or more of coordinating and subordinating words within a phrase are aggregated into one or more words that are separate from respective, coordinated or subordinated words, and the one or more of coordinating and subordinating words and their respective, coordinated or subordinated words are aggregated into one phrase;words that are one or more of coordinating and subordinating phrases within a clause are aggregated into one or more phrases that are separate from respective, coordinated or subordinated phrases, and the one or more of coordinating and subordinating phrases and their respective, coordinated or subordinated phrases are aggregated into one clause; andphrases that are one or more of coordinating and subordinating clauses are aggregated into one or more clauses that are separate from their respective coordinated or subordinated clauses;the processor determining syntactic connections between at least, every clause and its syntactically superior clause in the same sentence;every clause and its syntactically superior phrase, if they exist within the same sentence;every phrase and its syntactically superior phrase within the same clause; andevery word and its syntactically superior word within the same phrase; and wherein, an element without its syntactically superior element becomes a root element;if one or more coordinated elements have a common syntactically superior element, the coordinated elements are connected to their respective coordinating element, and the coordinating element is connected to the element which is syntactically superior to the respective coordinated elements; andif one or more coordinated elements are all syntactically superior to an element, the syntactically subordinated element is connected to the element coordinating its respective syntactically superior elements; andthe processor classifying each word, phrase and clause, wherein classification reflects a syntactic function of the word, the phrase and the clause; andphrases and clauses share a partially common set of possible syntactic functions comprising at least subject, object, complement and attribute function.
地址 Palo Alto CA US