摘要 |
The present invention provides a corpus-independent method for determining relevancy of terms to content of text appearing in a document by analyzing the document itself. Conventional information extraction, or other methods, may be applied to a document to generate a list of terms. The invention analyzes the document using relevancy scoring algorithms to determine a term relevancy score representing the term's relevance to the text contained in the document. The scores, including an aggregate score, may be normalized in the process. Based on relevancy scoring, terms are then ranked and further processed. In this manner relevancy is determined based on the subject document itself and by analyzing the occurrences and locations of the terms within the document. Additional techniques may be applied to relate the relevancy scores generated by the present invention to a corpus or collection of documents.
|