发明名称 System and method for ranking search results within citation intensive document collections
摘要 Systems and methods facilitate a search and identify documents and associated metadata reflecting content of the documents. In one implementation, a method receives a query comprising a set of search terms, identifies a stored document in response to the query, and determines a score value for the retrieved document based on a similarity between one or more of the query search terms and metadata associated with the identified document. The method locates the identified document in a citation network of baseline query results, the citation network comprising a first set of documents that cite to the identified document and a second set of documents cited to by the identified document. The method further determines a new score value of the identified document as a function of the score value and a quantity and a quality of documents within the first and second set of documents.
申请公布号 US8886638(B2) 申请公布日期 2014.11.11
申请号 US201213403253 申请日期 2012.02.23
申请人 LexisNexis 发明人 Zhang Ling Qin;Silver Harry R.
分类号 G06F7/00;G06F17/30;G06F17/00 主分类号 G06F7/00
代理机构 Dinsmore & Shohl LLP 代理人 Dinsmore & Shohl LLP
主权项 1. A computerized method for calculating a normalized activity score value to rank an identified document, the method comprising: identifying a stored document; determining a number of times the identified document was cited in a subject matter community of the identified document; determining a probability distribution that individual documents within the subject matter community are cited a variable number of times by other individual documents in the subject matter community; calculating a probability function by performing a regression on the probability distribution; calculating the activity score value according to an activity score function formulated as an inverse of the probability function such that the activity score function is defined by:Score⁡(x)=k(a·xα+1)p,wherein: Score(x) is the activity score value, k and p are constants, x is the number of documents citing the identified document, and a and α are learned from the regression on the probability distribution; and the activity score function is such that the activity score value is calculated according to a probability that the individual document in the subject matter community is cited a number of times greater than or equal to the number of times the identified document was cited in the subject matter community; weighting the activity score value by an age of the identified document; and storing in computer memory a ranking of the identified document based on the activity score value.
地址 Miamisburg OH US