发明名称 AUTOMATIC ARTIST AND CONTENT BREAKOUT PREDICTION
摘要 Methods, systems and computer program products for clustering pages into headline dusters are provided by collecting web data, identifying pages from the web data, tokenizing unique words in each page, recognizing unique entities in each page, detecting media links in each page, and constructing a plurality of vector representations of each page. A first dimension of each vector representation includes the unique words tokenized in each page, a second dimension of each vector representation includes the unique entities recognized in each page, and a third dimension of each vector representation includes the media links detected in each page. The vector representations are, in turn, clustered.
申请公布号 US2017024486(A1) 申请公布日期 2017.01.26
申请号 US201615216392 申请日期 2016.07.21
申请人 SPOTIFY AB 发明人 Jacobson Kurt;Stowell Daniel E.;Whitman Brian;Koumis Athena Y.;Steinbach Jason H.
分类号 G06F17/30;G06F17/27 主分类号 G06F17/30
代理机构 代理人
主权项 1. A method for clustering pages into headline clusters, comprising the steps of: collecting web data; identifying one or more pages from the web data; tokenizing one or more unique words in each page; recognizing one or more unique entities in each page; detecting one or more media links in each page; constructing a plurality of vector representations of each page, wherein a first dimension of each vector representation includes the one or more unique words tokenized in each page; a second dimension of each vector representation includes the one or more unique entities recognized in each page, and a third dimension of each vector representation includes the one or more media links detected in each page; and clustering the plurality of vector representations.
地址 Stockholm SE