发明名称 PERFORMING MULTI-CONVOLUTION OPERATIONS IN A PARALLEL PROCESSING SYSTEM
摘要 In one embodiment of the present invention a convolution engine configures a parallel processing pipeline to perform multi-convolution operations. More specifically, the convolution engine configures the parallel processing pipeline to independently generate and process individual image tiles. In operation, for each image tile, the pipeline calculates source locations included in an input image batch. Notably, the source locations reflect the contribution of the image tile to an output tile of an output matrix—the result of the multi-convolution operation. Subsequently, the pipeline copies data from the source locations to the image tile. Similarly, the pipeline copies data from a filter stack to a filter tile. The pipeline then performs matrix multiplication operations between the image tile and the filter tile to generate data included in the corresponding output tile. To optimize both on-chip memory usage and execution time, the pipeline creates each image tile in on-chip memory as-needed.
申请公布号 US2016062947(A1) 申请公布日期 2016.03.03
申请号 US201514838291 申请日期 2015.08.27
申请人 NVIDIA CORPORATION 发明人 CHETLUR Sharanyan;CATANZARO Bryan
分类号 G06F17/15;G06F17/16 主分类号 G06F17/15
代理机构 代理人
主权项 1. A computer-implemented method for performing a multi-convolution operation, the method comprising: calculating a first source location included in an image batch that is stored in a first memory based on a first destination location included in a first image tile that is stored in a second memory; copying data from the first source location to the first destination location; copying data from a filter source location included in a filter stack that is stored in the first memory to a filter destination location included in a first filter tile that is stored in the second memory; and performing one or more matrix multiplication operations between the first image tile and the first filter tile to generate a first output tile associated with an output matrix that is stored in the second memory.
地址 Santa Clara CA US