发明名称 Consistent phrase relevance measures
摘要 Two methods for measuring keyword-document relevance are described. The methods receive a keyword and a document as input and output a probability value for the keyword. The first method is a similarity-based approach which uses techniques for measuring similarity between two short-text segments to measure relevance between the keyword and the document. The second method is a regression-based approach based on an assumption that if an out-of-document phrase (the keyword) is semantically similar to an in-document phrase, then relevance scores of the in and out-of document phrases should be close to each other.
申请公布号 US8996515(B2) 申请公布日期 2015.03.31
申请号 US201213609257 申请日期 2012.09.11
申请人 Microsoft Corporation 发明人 Yih Wen-tau;Meek Christopher A.
分类号 G06F7/00;G06F17/30;G06Q30/02 主分类号 G06F7/00
代理机构 代理人 Swain Sandy;Yee Judy;Minhas Micky
主权项 1. One or more computer readable media, not comprising a signal, storing information to enable a computing device to perform a process of predicting a probability that an input out-of-document phrase that is not in a document is relevant to the document, the process comprising: applying an in-document phrase relevance measure to the target document to get a list of in-document keywords in the document and respective associated probabilities of relevance, to the document, of the in-document keywords; representing each in-document keyword as a respective term vector, each term vector computed by expansion of it corresponding in-document keyword, wherein each in-document keyword has a respective term vector and probability; computing a term vector for the out-of-document phrase by performing term expansion on the out-of-document phrase, terms of the term vector for the out-of-document phrase having respective weights; and using a regression model to predict the probability of relevance, to the document, of the out-of-document phrase, wherein the regression model uses the term vectors and probabilities of the in-document keywords, respectively, and uses the term vector of the out-of-document phrase to predict the probability of relevance of the out-of-document phrase, wherein the probability of relevance of the out-of-document phrase is consistent with the probabilities of the in-document keywords.
地址 Redmond WA US