发明名称 Streaming memory transpose operations
摘要 According to one general aspect, an apparatus may include a load/store unit, an execution unit, and a first and a second data path. The load/store unit may be configured to load/store data from/to a memory and transmit the data to/from an execution unit, wherein the data includes a plurality of elements. The execution unit may be configured to perform an operation upon the data. The load/store unit may be configured to transmit the data to/from the execution unit via either a first data path configured to communicate, without transposition, the data between the load/store unit and the execution unit, or a second data path configured to communicate, with transposition, the data between the load/store unit and the execution unit, wherein transposition includes dynamically distributing portions of the data amongst a plurality of elements according to an instruction.
申请公布号 US9513908(B2) 申请公布日期 2016.12.06
申请号 US201314017238 申请日期 2013.09.03
申请人 SAMSUNG ELECTRONICS CO., LTD. 发明人 Ahmed Ashraf;Humphries Nicholas Todd;Augustin Marc Michael
分类号 G06F5/00;G06F9/30 主分类号 G06F5/00
代理机构 Renaissance IP Law Group LLP 代理人 Renaissance IP Law Group LLP
主权项 1. An apparatus comprising: a load unit configured to, in response to an instruction, load data from a memory and transmit the data to an execution unit, wherein the data includes a plurality of elements; a first data path configured to communicate, without transposition, the data between the load unit and a execution unit; a second data path configured to communicate, with transposition, the data between the load unit and the execution unit, wherein transposition includes distributing portions of the data amongst the plurality of elements according to the instruction; and the execution unit configured to perform a mathematical operation upon the data; wherein the load unit is configured to transmit the data to the execution unit via either the first data path or the second data path; wherein the second data path comprises: a plurality of buffer memories, arranged in parallel, each configured to temporarily store respective data, a multiplexer configured to determine which data from the plurality of buffer memories is provided to a transposition unit, and the transposition unit configured to, based upon the instruction, move portions of the data amongst the plurality of elements; wherein the first data path is configured to transmit a first set of data between the load unit and the execution unit while the second data path is transmitting, in a parallel fashion, a second set of data between the load unit and the execution unit.
地址 KR