摘要 |
A system and method for selecting a proxy keyword for an unknown document. An unknown document is received by a receiver. A plurality of candidate documents and corresponding keywords are determined for the unknown document. Using the keywords from the candidate documents, proxy keywords are determined for the unknown document based on a plurality of factors including a length of the keywords, a distance of the candidate documents from the unknown document, a similarity of the text between the unknown document and the respective candidate document, a rank of the keywords within each candidate document, and a frequency of the keyword within its respective candidate document.
|