发明名称 |
SYSTEM AND METHOD FOR CLASSIFICATION OF MICROBLOG POSTS BASED ON IDENTIFICATION OF TOPICS |
摘要 |
A method for assigning a topic to a collection of microblog posts may include, by an acquisition module, receiving from at least one messaging service server, a plurality of posts, wherein each of the plurality of posts comprise post content; by a generation module, analyzing the posts and extract, from at least one of the posts, a link with an address to an external document; and, by the acquisition module, accessing the external document that is associated with the address and fetch external content associated with the document. The method may also include by the generation module: analyzing the post content to identify at least one label for each post, for each post that includes a link, analyzing the external content to identify a topic, and using a topic modeling technique to generate a trained topic model comprising a plurality of topics and a plurality of associated words. |
申请公布号 |
US2017075991(A1) |
申请公布日期 |
2017.03.16 |
申请号 |
US201615098488 |
申请日期 |
2016.04.14 |
申请人 |
Xerox Corporation |
发明人 |
Kataria Saurabh;Agarwal Arvind |
分类号 |
G06F17/30;G06N7/00;G06N99/00 |
主分类号 |
G06F17/30 |
代理机构 |
|
代理人 |
|
主权项 |
1. A method for assigning a topic to a collection of microblog posts, the method comprising:
receiving from at least one messaging service server, a plurality of posts, wherein each of the plurality of posts comprise post content; analyzing the posts and extracting, from at least one of the posts, a link with an address to an external document; causing a communications hardware component to initiate a communication via a communications network that accesses the external document at an external server that is associated with the address and fetches external content associated with the document; and by a generation module:
analyzing the post content to identify at least one label for each post,for each post that includes a link, analyzing the external content to identify a topic, andusing a topic modeling technique to generate a trained topic model, wherein the trained topic model comprises, for each identified label, a plurality of topics and a plurality of words associated with each of the plurality of topics; andsaving the trained topic model to a computer-readable memory device. |
地址 |
Norwalk CT US |