主权项 |
1. A method for identifying sets of correlated words comprising:
receiving information for a set of documents; wherein the set of documents comprises a plurality of words; wherein a particular document of the set of documents comprises a particular word of the plurality of words; running an inference algorithm over a Dirichlet distribution of the plurality of words in the set of documents to produce sampler result data, further comprising:
retrieving a first counter value from a first data structure,based, at least in part, on the first counter value, assigning a particular topic, of a plurality of topics, to the particular word in the particular document to produce a topic assignment for the particular word,after assigning the particular topic to the particular word, updating a second counter value in a second data structure,wherein the second counter value reflects the topic assignment, andwherein the first data structure is distinct from the second data structure; and determining, from the sampler result data, one or more sets of correlated words; wherein the method is performed by one or more computing devices. |