发明名称 Parallel processing of data sets
摘要 Systems, methods, and devices are described for implementing learning algorithms on data sets. A data set may be partitioned into a plurality of data partitions that may be distributed to two or more processors, such as a graphics processing unit. The data partitions may be processed in parallel by each of the processors to determine local counts associated with the data partitions. The local counts may then be aggregated to form a global count that reflects the local counts for the data set. The partitioning may be performed by a data partition algorithm and the processing and the aggregating may be performed by a parallel collapsed Gibbs sampling (CGS) algorithm and/or a parallel collapsed variational Bayesian (CVB) algorithm. In addition, the CGS and/or the CVB algorithms may be associated with the data partition algorithm and may be parallelized to train a latent Dirichlet allocation model.
申请公布号 US8868470(B2) 申请公布日期 2014.10.21
申请号 US201012942736 申请日期 2010.11.09
申请人 Microsoft Corporation 发明人 Xu Ning-Yi;Hsu Feng-Hsiung;Yan Feng
分类号 G06F1/00;G06N5/00;G06F9/50 主分类号 G06F1/00
代理机构 Lee & Hayes PLLC 代理人 Boelitz Carole;Minhas Micky;Lee & Hayes PLLC
主权项 1. A method comprising: partitioning a data set into a plurality of data partitions, the partitioning including removing dependencies in the data set that require some of the data partitions to be processed sequentially rather than in parallel; distributing the plurality of data partitions to a plurality of processors, each of the plurality of data partitions being assigned to a single one of the plurality of processors; processing, by the plurality of processors, each of the plurality of data partitions in parallel; and synchronizing the plurality of processors to obtain a global record corresponding to the processed data partitions.
地址 Redmond WA US
您可能感兴趣的专利