发明名称 DOCUMENT TAGGING AND RETRIEVAL USING PER-SUBJECT DICTIONARIES INCLUDING SUBJECT-DETERMINING-POWER SCORES FOR ENTRIES
摘要 Techniques for managing big data include tagging of documents and subsequent retrieval using per-subject dictionaries having entries with subject-determining-power scores. The subject-determining-power scores provide an indication of the descriptive power of the term with respect to the subject of the dictionary containing the term. The same term may have entries in multiple dictionaries with different subject-determining-power scores in each of the dictionaries. A retrieval request for one or more documents containing search terms descriptive of the one or more documents can be processed identifying a set of candidate documents tagged with subjects and optional terms, and then applying subject-determining-power scores from the multiple dictionaries for the search term to determine a subject for the search term. The method then selects the one or more documents from the candidate documents according to the subject.
申请公布号 US2014337357(A1) 申请公布日期 2014.11.13
申请号 US201313891610 申请日期 2013.05.10
申请人 BUSINESS MACHINES CORPORATION INTERNATIONAL 发明人 Gattiker Anne Elizabeth;Gebara Fadi H.;Hylick Anthony N.;Kanj Rouwaida N.;Li Jian
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A method of organizing a collection of documents, the method comprising: storing entries in multiple dictionaries, wherein the multiple dictionaries are each associated with a different subject, wherein the entries contain descriptive terms and corresponding subject-determining-power scores, wherein an individual subject-determining-power score indicates the relative strength or weakness of the corresponding descriptive term with respect to the subject associated with a particular dictionary containing the entry, and wherein at least some of the descriptive terms are present in two or more of the multiple dictionaries; and accessing the collection of documents by associating particular ones of the descriptive terms contained in the collection of documents with one or more associated subjects of one or more of the multiple dictionaries containing the particular descriptive terms.
地址 US