发明名称 SYSTEM AND METHOD FOR CLASSIFICATION OF MICROBLOG POSTS BASED ON IDENTIFICATION OF TOPICS
摘要 A method for assigning a topic to a collection of microblog posts may include, by an acquisition module, receiving from at least one messaging service server, a plurality of posts, wherein each of the plurality of posts comprise post content; by a generation module, analyzing the posts and extract, from at least one of the posts, a link with an address to an external document; and, by the acquisition module, accessing the external document that is associated with the address and fetch external content associated with the document. The method may also include by the generation module: analyzing the post content to identify at least one label for each post, for each post that includes a link, analyzing the external content to identify a topic, and using a topic modeling technique to generate a trained topic model comprising a plurality of topics and a plurality of associated words.
申请公布号 US2017075991(A1) 申请公布日期 2017.03.16
申请号 US201615098488 申请日期 2016.04.14
申请人 Xerox Corporation 发明人 Kataria Saurabh;Agarwal Arvind
分类号 G06F17/30;G06N7/00;G06N99/00 主分类号 G06F17/30
代理机构 代理人
主权项 1. A method for assigning a topic to a collection of microblog posts, the method comprising: receiving from at least one messaging service server, a plurality of posts, wherein each of the plurality of posts comprise post content; analyzing the posts and extracting, from at least one of the posts, a link with an address to an external document; causing a communications hardware component to initiate a communication via a communications network that accesses the external document at an external server that is associated with the address and fetches external content associated with the document; and by a generation module: analyzing the post content to identify at least one label for each post,for each post that includes a link, analyzing the external content to identify a topic, andusing a topic modeling technique to generate a trained topic model, wherein the trained topic model comprises, for each identified label, a plurality of topics and a plurality of words associated with each of the plurality of topics; andsaving the trained topic model to a computer-readable memory device.
地址 Norwalk CT US
您可能感兴趣的专利