发明名称 Universal FPGA/ASIC matrix-vector multiplication architecture
摘要 A universal single-bitstream FPGA library or ASIC implementation accelerates matrix-vector multiplication processing multiple matrix encodings including dense and multiple sparse formats. A hardware-optimized sparse matrix representation referred to herein as the Compressed Variable-Length Bit Vector (CVBV) format is used to take advantage of the capabilities of FPGAs and reduce storage and bandwidth requirements across the matrices compared to that typically achieved when using the Compressed Sparse Row (CSR) format in typical CPU- and GPU-based approaches. Also disclosed is a class of sparse matrix formats that are better suited for FPGA implementations than existing formats reducing storage and bandwidth requirements. A partitioned CVBV format is described to enable parallel decoding.
申请公布号 US9317482(B2) 申请公布日期 2016.04.19
申请号 US201213651464 申请日期 2012.10.14
申请人 Microsoft Technology Licensing, LLC 发明人 Davis John D.;Chung Eric;Kestur Srinidhi
分类号 G06F17/16 主分类号 G06F17/16
代理机构 代理人 Swain Sandy;Minhas Micky
主权项 1. A matrix-vector multiplication device comprising: a runtime programmable decoder for transforming processor-centric sparse matrix data directly into field-programmable gate array (FPGA)-centric matrix data without utilizing an intermediate representation of the matrix data by: determining that the processor-centric sparse matrix data is encoded in a sparse format;identifying which sparse format of a plurality of sparse formats that the processor-centric sparse matrix data is encoded in based on metadata associated with the matrix data; andtransforming the processor-centric sparse matrix data directly into the FPGA-centric matrix data based on the identified sparse format; a plurality of data stream-first-in-first-out (FIFO) queues for managing a plurality of data streams for processing; a plurality of processing pipes for receiving the plurality of data streams from the plurality of data stream FIFO queues and processing data streams from among the plurality of data streams; and a vector memory for multiplexing the processed plurality of data streams into output data.
地址 Redmond WA US