发明名称 Method and system for ranking intellectual property documents using claim analysis
摘要 The present invention provides a method and system for re-ranking search results in a patent retrieval system where the query text is derived in whole or in part from a patent claim, which may be from an existing patent or a prospective claim. The re-ranking is based on several features of the candidate patent, such as the text similarity to the claim, international patent code or other classification or subject matter relatedness or overlap, and internal citation structure of the candidates. One alternative aspect provides a re-ranker that is trained on automatically generated training data, thus obviating the expensive and time-intensive step of expert annotation.
申请公布号 US9110971(B2) 申请公布日期 2015.08.18
申请号 US201012658165 申请日期 2010.02.03
申请人 Thomson Reuters Global Resources 发明人 Liao Wenhui;Veeramachaneni Sriharsha;Quick Gary;Vachher Arun
分类号 G06F17/30;G06Q50/18 主分类号 G06F17/30
代理机构 Valenti, Hanley & Robinson, PLLC 代理人 Valenti, Hanley & Robinson, PLLC ;Duncan Kevin T.
主权项 1. A computer-based system for processing a user query related to patent claim terms to generate a set of patent documents responsive to the query, the computer-based system comprising: a search engine executed by a computer and being adapted to receive a query and, based on the query, to search claims of patent documents contained in at least one database and adapted to yield a first set of candidate patent documents, wherein the query comprises a plurality of query permutations derived from the original query language and comprising either or both of a claim text-based query permutation and a key concept-based query permutation, and wherein the search engine is adapted to execute a plurality of query search permutations in arriving at the first set of candidate patent documents; and a re-ranking module comprising code executable by the computer and adapted to re-rank the entire first set of candidate patent documents based at least in part on a set of patent features without reducing the number of candidate patent documents in the set and generate a second set of ranked patent documents, the re-ranking module being adapted to weight the set of patent features based on a previously executed learning process; wherein the set of patent features comprises one or more from a group consisting of: fields of a patent; patent title; patent abstract; patent IPC code; patent references; patent claims; rank-c, representing a lowest rank of any claim of a patent in the first set of candidate patent documents; sim(c,c), representing a highest similarity score between the query and claims in a patent in the first set of candidate patent documents; sim(c,cs), representing a similarity score between the query and all the claims of a patent in the first set of candidate patent documents; sim(c,title), representing a similarity score between the query and the title of a patent in the first set of candidate patent documents; sim(c,abstract), representing a similarity score between the query and the abstract of a patent in the first set of candidate patent documents; sim(key,key), representing a similarity score between key concepts of the query and a patent in the first set of candidate patent documents; sim(key,title), representing a similarity score between the key concept of the query and the title of a patent in the first set of candidate patent documents; sim(key,abstract), representing a similarity score between the key concept of the query and the abstract of a patent in the first set of candidate patent documents; IPC-overlap, representing a number of overlapping IPC codes between IPC codes of a patent in the first set of candidate patent documents and the IPC codes of an initial high-ranking set of patents in the first set of candidate patent documents; and direct-Cite, representing the number of patents in the initial high-ranking set of patent documents that cite or are cited by a patent in the first set of candidate patent documents.
地址 CH