发明名称 Learning Topics By Simulation Of A Stochastic Cellular Automaton
摘要 Herein is described an unsupervised learning method to discover topics and reduce the dimensionality of documents by designing and simulating a stochastic cellular automaton. A key formula that appears in many inference methods for LDA is used as the local update rule of the cellular automaton. Approximate counters may be used to represent counter values being tracked by the inference algorithms. Also, sparsity may be used to reduce the amount of computation needed for sampling a topic for particular words in the corpus being analyzed.
申请公布号 US2016350411(A1) 申请公布日期 2016.12.01
申请号 US201514932825 申请日期 2015.11.04
申请人 Oracle International Corporation 发明人 Tristan Jean-Baptiste;Green Stephen J.;Steele, JR. Guy L.;Zaheer Manzil
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A method for identifying sets of correlated words comprising: receiving information for a set of documents; wherein the set of documents comprises a plurality of words; wherein a particular document of the set of documents comprises a particular word of the plurality of words; running an inference algorithm over a Dirichlet distribution of the plurality of words in the set of documents to produce sampler result data, further comprising: retrieving a first counter value from a first data structure,based, at least in part, on the first counter value, assigning a particular topic, of a plurality of topics, to the particular word in the particular document to produce a topic assignment for the particular word,after assigning the particular topic to the particular word, updating a second counter value in a second data structure,wherein the second counter value reflects the topic assignment, andwherein the first data structure is distinct from the second data structure; and determining, from the sampler result data, one or more sets of correlated words; wherein the method is performed by one or more computing devices.
地址 Redwood Shores CA US