Parallel processing of data sets,申请号US201012942736-传众专利搜索

发明名称	Parallel processing of data sets
摘要	Systems, methods, and devices are described for implementing learning algorithms on data sets. A data set may be partitioned into a plurality of data partitions that may be distributed to two or more processors, such as a graphics processing unit. The data partitions may be processed in parallel by each of the processors to determine local counts associated with the data partitions. The local counts may then be aggregated to form a global count that reflects the local counts for the data set. The partitioning may be performed by a data partition algorithm and the processing and the aggregating may be performed by a parallel collapsed Gibbs sampling (CGS) algorithm and/or a parallel collapsed variational Bayesian (CVB) algorithm. In addition, the CGS and/or the CVB algorithms may be associated with the data partition algorithm and may be parallelized to train a latent Dirichlet allocation model.
申请公布号	US8868470(B2)	申请公布日期	2014.10.21
申请号	US201012942736	申请日期	2010.11.09
申请人	Microsoft Corporation	发明人	Xu Ning-Yi;Hsu Feng-Hsiung;Yan Feng
分类号	G06F1/00;G06N5/00;G06F9/50	主分类号	G06F1/00
代理机构	Lee & Hayes PLLC	代理人	Boelitz Carole;Minhas Micky;Lee & Hayes PLLC
主权项	1. A method comprising: partitioning a data set into a plurality of data partitions, the partitioning including removing dependencies in the data set that require some of the data partitions to be processed sequentially rather than in parallel; distributing the plurality of data partitions to a plurality of processors, each of the plurality of data partitions being assigned to a single one of the plurality of processors; processing, by the plurality of processors, each of the plurality of data partitions in parallel; and synchronizing the plurality of processors to obtain a global record corresponding to the processed data partitions.
地址	Redmond WA US