发明名称 Authorship enhanced corpus ingestion for natural language processing
摘要 Mechanisms for processing a corpus of information in a natural language processing system are provided. A corpus of information to process is identified and a set of author profiles associated with the corpus of information is retrieved. A content profile is generated for a portion of content of the corpus of information and the content profile is compared to the set of author profiles to generate an association of the content profile with at least one author profile in the set of author profiles. In addition, a processing operation of the natural language processing (NLP) system is controlled based on the association of the content profile with the at least one author profile.
申请公布号 US9483519(B2) 申请公布日期 2016.11.01
申请号 US201314012337 申请日期 2013.08.28
申请人 International Business Machines Corporation 发明人 Bastide Paul R.;Broomhall Matthew E.;Loredo Robert E.;Lu Fang
分类号 G06F17/00;G06F17/30 主分类号 G06F17/00
代理机构 代理人 Walder, Jr. Stephen J.;Woycechowsky David B.
主权项 1. A method, in a data processing system comprising a processor and a memory, for processing a corpus of information in a natural language processing system, the method comprising: identifying, by the data processing system, a corpus of information to process; retrieving, by the data processing system, a set of author profiles associated with the corpus of information; presenting, by the data processing system, a user interface that comprises the set of author profiles, through which a user input is received specifying a user selection of at least one user selected author profile in the set of author profiles; generating, by the data processing system, a content profile for a portion of content of the corpus of information; comparing, by the data processing system, the content profile to the set of author profiles to generate an association of the content profile with at least one author profile in the set of author profiles; and controlling a processing operation of the natural language processing (NLP) system based on the association of the content profile with the at least one author profile and a determined level of correspondence of the at least one author profile with the at least one user selected author profile, wherein the processing operation is an ingestion operation that ingests portions of content from the corpus of information, wherein controlling the processing operation of the NLP system based on the association of the content profile with the at least one author profile and a determined level of correspondence of the at least one author profile with the at least one user selected author profile, comprises: ingesting first content, associated with a first user selected author profile having an associated first user specified priority value associated with the first user selected author profile, from the corpus into the NLP system and performing an NLP operation on the first content; and ingesting second content, associated with a second user selected author profile having an associated second user specified priority value associated with the second user selected author profile, after ingesting the first content and performing the NLP operation on the first content, and performing the NLP operation on the second content, wherein the first user specified priority value indicates a higher priority associated with the first user selected author profile than the second user specified priority value.
地址 Armonk NY US