发明名称 |
System and method for ranking search results within citation intensive document collections |
摘要 |
Systems and methods facilitate a search and identify documents and associated metadata reflecting content of the documents. In one implementation, a method receives a query comprising a set of search terms, identifies a stored document in response to the query, and determines a score value for the retrieved document based on a similarity between one or more of the query search terms and metadata associated with the identified document. The method locates the identified document in a citation network of baseline query results, the citation network comprising a first set of documents that cite to the identified document and a second set of documents cited to by the identified document. The method further determines a new score value of the identified document as a function of the score value and a quantity and a quality of documents within the first and second set of documents. |
申请公布号 |
US8886638(B2) |
申请公布日期 |
2014.11.11 |
申请号 |
US201213403253 |
申请日期 |
2012.02.23 |
申请人 |
LexisNexis |
发明人 |
Zhang Ling Qin;Silver Harry R. |
分类号 |
G06F7/00;G06F17/30;G06F17/00 |
主分类号 |
G06F7/00 |
代理机构 |
Dinsmore & Shohl LLP |
代理人 |
Dinsmore & Shohl LLP |
主权项 |
1. A computerized method for calculating a normalized activity score value to rank an identified document, the method comprising:
identifying a stored document; determining a number of times the identified document was cited in a subject matter community of the identified document; determining a probability distribution that individual documents within the subject matter community are cited a variable number of times by other individual documents in the subject matter community; calculating a probability function by performing a regression on the probability distribution; calculating the activity score value according to an activity score function formulated as an inverse of the probability function such that the activity score function is defined by:Score(x)=k(a·xα+1)p,wherein:
Score(x) is the activity score value, k and p are constants, x is the number of documents citing the identified document, and a and α are learned from the regression on the probability distribution; and the activity score function is such that the activity score value is calculated according to a probability that the individual document in the subject matter community is cited a number of times greater than or equal to the number of times the identified document was cited in the subject matter community; weighting the activity score value by an age of the identified document; and storing in computer memory a ranking of the identified document based on the activity score value. |
地址 |
Miamisburg OH US |