发明名称 |
Systems and methods for authoritativeness grading, estimation and sorting of documents in large heterogeneous document collections |
摘要 |
Systems and methods for determining the authoritativeness of a document based on textual, non-topical cues. The authoritativeness of a document is determined by evaluating a set of document content features contained within each document to determine a set of document content feature values, processing the set of document content feature values through a trained document textual authority model, and determining a textual authoritativeness value and/or textual authority class for each document evaluated using the predictive models included in the trained document textual authority model. Estimates of a document's textual authoritativeness value and/or textual authority class can be used to re-rank documents previously retrieved by a search, to expand and improve document query searches, to provide a more complete and robust determination of a document's authoritativeness, and to improve the aggregation of rank-ordered lists with numerically-ordered lists. |
申请公布号 |
EP1363207(A2) |
申请公布日期 |
2003.11.19 |
申请号 |
EP20030011240 |
申请日期 |
2003.05.16 |
申请人 |
XEROX CORPORATION |
发明人 |
FARAHAT, AYMAN O.;CHEN, FRANCINE R.;MATHIS, CHARLES R.;NUNBERG, GEOFFREY D. |
分类号 |
G06F17/27;G06F17/30 |
主分类号 |
G06F17/27 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|