发明名称 INGESTING DOCUMENTS USING MULTIPLE INGESTION PIPELINES
摘要 A primary ingestion pipeline configured for use in natural language processing includes annotators configured for annotating documents. The annotators and documents to be annotated are evaluated. Based on the evaluations, an ingestion risk score is generated for each document. Each ingestion risk score represents a likelihood that an associated document will not successfully be annotated by the annotators. Each ingestion risk score is compared to a set of risk criteria. Based on the comparisons, a determination is made that each document of a first set of documents satisfies the set of risk criteria. A further determination is made, based on the comparisons, that each document of a second set of documents does not satisfy the set of risk criteria. In response to these determinations, the first set of documents is entered into the primary ingestion pipeline and the second set of documents is provided special handling.
申请公布号 US2016359894(A1) 申请公布日期 2016.12.08
申请号 US201514728050 申请日期 2015.06.02
申请人 International Business Machines Corporation 发明人 Andrejko Pamela D.;Freed Andrew R.;Murch Cynthia M.;Nordland Jan M.;Rivero Humberto R.
分类号 H04L29/06;G06F17/30;G06F17/24 主分类号 H04L29/06
代理机构 代理人
主权项 1. A method for analyzing a primary ingestion pipeline configured for use in natural language processing (NLP), the primary ingestion pipeline including a plurality of annotators configured for annotating documents passing through the primary ingestion pipeline, the method comprising: evaluating the plurality of annotators; evaluating a plurality of documents to be annotated by the plurality of annotators; generating, based on the evaluating the plurality of annotators and further based on the evaluating the plurality of documents, an ingestion risk score for each document of the plurality of documents, wherein each ingestion risk score represents a likelihood that an associated document will not successfully be annotated by the plurality of annotators while passing through the primary ingestion pipeline; comparing each ingestion risk score to a set of risk criteria; determining, based on the comparing, that each document of a first set of documents of the plurality of documents satisfies the set of risk criteria and that each document of a second set of documents of the plurality of documents does not satisfy the set of risk criteria; entering, in response to the determining, the first set of documents into the primary ingestion pipeline; and providing, in response to the determining, special handling to the second set of documents.
地址 Armonk NY US