发明名称 Finding a top-K diversified ranking list on graphs
摘要 A method, system and computer program product for finding a diversified ranking list for a given query. In one embodiment, a multitude of date items responsive to the query are identified, a marginal score is established for each data item; and a set, or ranking list, of the data items is formed based on these scores. This ranking list is formed by forming an initial set, and one or more data items are added to the ranking list based on the marginal scores of the data items. In one embodiment, each of the data items has a measured relevance and a measured diversity value, and the marginal scores for the data items are based on the measured relevance and the measured diversity values of the data items.
申请公布号 US9009147(B2) 申请公布日期 2015.04.14
申请号 US201113213856 申请日期 2011.08.19
申请人 International Business Machines Corporation 发明人 He Jingrui;Konuru Ravi B.;Lin Ching-Yung;Tong Hanghang;Wen Zhen
分类号 G06F17/30;G06F7/00 主分类号 G06F17/30
代理机构 Scully, Scott, Murphy & Presser, P.C. 代理人 Scully, Scott, Murphy & Presser, P.C. ;Dougherty, Esq. Anne V.
主权项 1. A method of finding a subset of k data items from a set of data items for a given query based on a specified measure of relevance and diversity, the method comprising: identifying a set of data items responsive to the query; and forming a subset S of the data items, including putting into the subset S an initial number of the data items; and adding one or more of the data items to the subset S, including determining a relevance/diversity score f (S) for the subset S measuring both (a) a relevance of the data items in the subset S to the query, and (b) a diversity among the data items in the subset S; for each of the data items i not in the subset S, determining a marginal contribution score s (i) for the each data item i by determining a relevance/diversity score f (S, i) for a subset of the data items formed by the union of the subset S and the each data item i, and subtracting f (S) from f (S, i) to obtain the marginal contribution score s (i) for the each data item i, adding to the subset S one or more of the data items i based on the marginal contribution scores for the data items i until the subset S has k data items, and wherein said relevance/diversity score f (S) for the subset S includes a defined measure of specified similarities that each of the data items in the subset S has to one or more of others of the of data items in the subset S; and wherein at least one of said identifying and forming a subset of the data items is carried out by a computer device.
地址 Armonk NY US