发明名称 CLUSTER-BASED IDENTIFICATION OF NEWS STORIES
摘要 Methods, systems, and techniques for cluster-based content recommendation are described. Some embodiments provide a content recommendation system (“CRS”) configured to recommend news stories about events or occurrences. In some embodiments, a news story about an event includes multiple related content items that each include an account of the event and that each reference one or more entities or categories that are represented by the CRS. In one embodiment, the CRS identifies news stories by generating clusters of related content items. Then, in response to a received query that indicates a keyterm, entity, or category, the CRS determines and provides indications of one or more news stories that are relevant to the received query. In some embodiments, at least some of these techniques are employed to implement a news story recommendation facility in an online news service.
申请公布号 US2015324449(A1) 申请公布日期 2015.11.12
申请号 US201514801739 申请日期 2015.07.16
申请人 VCVC III LLC 发明人 Koperski Krzysztof;Bhatti Satish;Liang Jisheng;Klein Adrian
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A method in a content recommendation computing system, the method comprising: using a processor of the computing system, automatically identifying a news story about an event, the news story including multiple related content items that each give an account of the event and that each reference multiple entities or categories that are each electronically represented by the content recommendation system, comprising: processing content items to determine semantic information that includes identified entities and relations between the identified entities;storing the identified entities and relations in a repository of the content recommendation system;generating a cluster that includes the multiple related content items, based at least in part on how many entities each of the multiple related content items has in common with one or more other of the multiple related content items, wherein generating the cluster comprises: finding a candidate cluster of a plurality of clusters that is nearest to one of the multiple related content items by computing a cosine distance between a term vector that represents the one content item and a term vector that represents a content item of the candidate cluster; anddetermining whether the candidate cluster is a suitable cluster for the one content item, based at least on: cosine distances between the one content item and content items of the candidate cluster and a quantity of content items of the candidate cluster that have a cosine distance to the content item that is below a predetermined threshold; andstoring an indication of the identified news story and the generated cluster.
地址 Seattle WA US