发明名称 Document retrieval using internal dictionary-hierarchies to adjust per-subject match results
摘要 Techniques for managing big data include retrieval using per-subject dictionaries having multiple levels of sub-classification hierarchy within the subject. Entries may include subject-determining-power (SDP) scores that provide an indication of the descriptive power of the entry term with respect to the subject of the dictionary containing the term. The same term may have entries in multiple dictionaries with different SDP scores in each of the dictionaries. A retrieval request for one or more documents containing search terms descriptive of the one or more documents can be processed by identifying a set of candidate documents tagged with subjects, i.e., identifiers of per-subject dictionaries having entries corresponding to a search term, then using affinity values to adjust the aggregate score for the terms in the dictionaries. Documents are then selected for best match to the subject based on the adjusted scores. Alternatively, the adjustment may be performed after selecting the documents by re-ordering them according to adjusted scores.
申请公布号 US9430559(B2) 申请公布日期 2016.08.30
申请号 US201514854767 申请日期 2015.09.15
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 Gattiker Anne Elizabeth;Gebara Fadi H.;Hylick Anthony N.;Kanj Rouwaida N.
分类号 G06F17/30 主分类号 G06F17/30
代理机构 Mitch Harris, Atty at Law, LLC 代理人 Mitch Harris, Atty at Law, LLC ;Harris Andrew M.;Stock William J.
主权项 1. A computer-performed method of retrieving documents pertaining to one or more subjects from a collection of documents, the method comprising: specifying the one or more subjects; selecting one or more dictionaries from among multiple dictionaries according to the specified one or more subjects, wherein the multiple dictionaries have an associated unique subject, wherein entries in the multiple dictionaries contain descriptive terms, wherein at least some of the descriptive terms are present in two or more of the multiple dictionaries; matching, by the at least one processor, the one or more subjects to documents in the collection of documents to obtain a subset of the collection of documents that are relevant to the one or more subjects, wherein the matching generates document scores indicating the relative strength of a relationship between the specified subjects and the documents; within the computer system, maintaining records of hierarchy of classification for the entries within the multiple dictionaries for the multiple dictionaries, wherein the hierarchy records encode or store affinity values showing a strength of relationship between the entries within a corresponding dictionary; adjusting, by the at least one processor, the document scores using the records of hierarchy of classification for the specified one or more subjects; and returning, by the at least one processor, at least a portion of the subset of the collection of documents obtained by the matching.
地址 Armonk NY US