发明名称 Method, computer system, and computer program for searching document data using search keyword
摘要 Techniques provide for searching pieces of document data using a search keyword. The technique includes: calculating, as a first vector, respective first scores at which or respective probabilities that each of the pieces of document data belongs to clusters or classes; calculating, as a second vector, respective second scores at which or respective probabilities that the search keyword or a relevant keyword associated with the search keyword belongs to the clusters or the classes; calculating an inner product of each of the first vectors and the second vector, the calculated inner product being a third score of the corresponding piece of document data regarding the search keyword; and acquiring a correlation value from document data containing each keyword in a classification keyword set and document data with the third score that is equal to or more than a predetermined threshold or is included in a predetermined high-ranking proportion.
申请公布号 US9122747(B2) 申请公布日期 2015.09.01
申请号 US201213605860 申请日期 2012.09.06
申请人 International Business Machines Corporation 发明人 Inagaki Takeshi
分类号 G06F7/00;G06F17/30 主分类号 G06F7/00
代理机构 Konrad, Raynes, Davda and Victor LLP 代理人 Davda Janaki K.;Konrad, Raynes, Davda and Victor LLP
主权项 1. A method for searching pieces of document data using a search keyword, the pieces of document data having a correlation with the search keyword, the method comprising: receiving the search keyword from a user terminal to search an index database stored in an index creating computer; calculating as first vectors respective probabilities that each of the pieces of document data belongs to clusters, wherein each of the first vectors corresponds to one of the pieces of document data; calculating as a second vector, respective probabilities that the search keyword belongs to the clusters; calculating an inner product of each of the first vectors and the second vector, the calculated inner product being a first score of the corresponding piece of document data regarding the search keyword, wherein the inner product represents a scalar value; and acquiring a correlation value from a classification keyword set containing facet keywords and pieces of document data with the first score that is equal to or more than a predetermined threshold by multiplying a probability that a first word conceptually matching a facet keyword from the facet keywords occurs in the pieces of document data and a probability that a second word conceptually matching the search keyword occurs in the pieces of document data to generate a second score, and dividing a probability that both the first word and the second word occur in the pieces of document data by the second score, wherein the facet keywords represent a viewpoint of information using a plurality of attribute values as metadata and that are automatically selected by the index creating computer; and displaying a search result on the user terminal in descending order based on the correlation.
地址 Armonk NY US