主权项 |
1. One or more computer readable media, not comprising a signal, storing information to enable a computing device to perform a process of predicting a probability that an input out-of-document phrase that is not in a document is relevant to the document, the process comprising:
applying an in-document phrase relevance measure to the target document to get a list of in-document keywords in the document and respective associated probabilities of relevance, to the document, of the in-document keywords; representing each in-document keyword as a respective term vector, each term vector computed by expansion of it corresponding in-document keyword, wherein each in-document keyword has a respective term vector and probability; computing a term vector for the out-of-document phrase by performing term expansion on the out-of-document phrase, terms of the term vector for the out-of-document phrase having respective weights; and using a regression model to predict the probability of relevance, to the document, of the out-of-document phrase, wherein the regression model uses the term vectors and probabilities of the in-document keywords, respectively, and uses the term vector of the out-of-document phrase to predict the probability of relevance of the out-of-document phrase, wherein the probability of relevance of the out-of-document phrase is consistent with the probabilities of the in-document keywords. |