发明名称 System and method for optimized source selection in an information retrieval system
摘要 In an information retrieval system, an automated system optimizes selection of sources in a distributed information system for query searching. A training set of documents is created for each source by randomly selecting significant portions of the documents thereof. A test set documents is created for each source from the documents not included in the training set. Each document in the training and test set is defined in terms of features/attributes and a name as samples representing individual sources. Pattern recognizing means process the samples to recognize patterns in the documents to distinguish one source from another source. Rule generating means provide a set of DNF rules from the patterns as a model representing each source. The test set of documents is expressed in terms of DNF rules. Evaluating means create a final classification model after minimizing any error between the DNF rules for the training and test sets. Query means enable a user to express a query in terms of features/attributes and DNF rules which when applied to the final model automatically select the optimal sources for query searching. The sources may also be expressed in taxonomic groupings which reduces the number of data sources and speeds query searching on a distributive information network by a user.
申请公布号 US5960422(A) 申请公布日期 1999.09.28
申请号 US19970979109 申请日期 1997.11.26
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 PRASAD, SEEMA
分类号 G06F17/30;(IPC1-7):G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址