发明名称 METHOD OF AUTOMATED DISCOVERY OF NEW TOPICS
摘要 The present disclosure relates to a method for performing automated discovery of new topics from unlimited documents related to any subject domain, employing a multi-component extension of Latent Dirichlet Allocation (MC-LDA) topic models, to discover related topics in a corpus. The resulting data may contain millions of term vectors from any subject domain identifying the most distinguished co-occurring topics that users may be interested in, for periodically building new topic ID models using new content, which may be employed to compare one by one with existing model to measure the significance of changes, using term vectors differences with no correlation with a Periodic New Model, for periodic updates of automated discovery of new topics, which may be used to build a new topic ID model in-memory database to allow query-time linking on massive data-set for automated discovery of new topics.
申请公布号 US2016042276(A1) 申请公布日期 2016.02.11
申请号 US201514919631 申请日期 2015.10.21
申请人 QBASE, LLC 发明人 LIGHTNER Scott;WECKESSER Franz;BODDHU Sanjay;FLAGG Robert
分类号 G06N5/02;G06F17/30 主分类号 G06N5/02
代理机构 代理人
主权项 1. A computer-implemented method comprising: identifying, by a computer, in one or more document corpora of a data source, a topic of interest based upon one or more concurring topics identified in the one or more document corpora; automatically extracting, by the computer, from a document corpus, data associated with a plurality of co-occurring topics based on the topic of interest; in response to automatically extracting the data associated with the plurality of co-occurring topics, extracting, by the computer, a plurality of topic identifiers from the plurality of co-occurring topics; and generating, by the computer, a periodic topic model comprising a set of one or more term vectors by comparing topic significance among the plurality of topic identifiers.
地址 RESTON VA US