发明名称 Predicting and enhancing document ingestion time
摘要 A mechanism is provided in a data processing system for predicting and enhancing ingestion time for a set of input documents. The mechanism receives a set of documents to be added to a corpus of the data processing system. The mechanism records document features of each document within the set of documents using an annotation engine within the data processing system. The mechanism predicts an ingestion time for each document within the set of documents based on the document characteristics and a machine learning model. The mechanism assigns the set of documents to data processing system resources to be processed based on the predicted ingestion time for each document.
申请公布号 US9563846(B2) 申请公布日期 2017.02.07
申请号 US201414266959 申请日期 2014.05.01
申请人 International Business Machines Corporation 发明人 Allen Corville O.;Freed Andrew R.
分类号 G06N5/04;G06F17/30;G06N99/00;G06N5/02;G06N7/00;G06K9/62 主分类号 G06N5/04
代理机构 代理人 Tkacs Stephen R.;Walder, Jr. Stephen J.;Gerhardt Diana R.
主权项 1. A computer program product comprising a non-transitory computer readable storage medium having a computer readable program stored therein, wherein the computer readable program, when executed on a computing device, causes the computing device to: receive a set of documents to be added to a corpus of documents; record document characteristics of each document within the set of documents using a characteristic recording annotator executing within the computing device; predict an ingestion time for each document within the set of documents based on the document characteristics and a machine learning model, wherein the ingestion time is a predicted time to ingest each given document by a plurality of annotators executing within the computing device; determining, for a given document, a document characteristic, wherein the document characteristic corresponds to a corresponding annotator used to process the document characteristic during ingestion; and assign the set of documents to question answering system resources to be processed based on the predicted ingestion time for each document, wherein assigning the set of documents to the question answering system resources comprises disabling or delaying execution of the corresponding annotator during ingestion of the given document.
地址 Armonk NY US