发明名称 COALESCING ADJACENT GATHER/SCATTER OPERATIONS
摘要 According to one embodiment, a processor includes an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction having a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the first instruction, to read contiguous a first and a second of the data elements from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry of the first storage location and a second data element in a second entry of a second storage location corresponding to the first entry of the first storage location.
申请公布号 US2016103786(A1) 申请公布日期 2016.04.14
申请号 US201514975222 申请日期 2015.12.18
申请人 Intel Corporation 发明人 FORSYTH Andrew T.;HICKMANN Brian J.;HALL Jonathan C.;HUGHES Christopher J.
分类号 G06F15/80;G06F9/38;G06F9/30 主分类号 G06F15/80
代理机构 代理人
主权项 1. A processor comprising: a plurality of 64-bit general-purpose registers; a plurality of 128-bit single instruction, multiple data (SIMD) registers; a data cache; an instruction cache; a level 2 (L2) cache coupled to the data cache and coupled to the instruction cache; a branch prediction unit; an instruction translation lookaside buffer (TLB) coupled to the instruction cache; an instruction fetch unit; a decode unit coupled to the instruction fetch unit, the decode unit to decode a plurality of instructions, including a first instruction, the first instruction to indicate a 128-bit operand size, the first instruction having a first field to specify a first 128-bit SIMD source register of the plurality of 128-bit SIMD registers, the first instruction having a second field to specify a 64-bit general-purpose register of the plurality of 64-bit general-purpose registers to store a base address, and the first instruction to indicate a data element width of 64-bits; and an execution unit coupled to the decode unit, coupled to the plurality of 128-bit SIMD registers, and coupled to the plurality of 64-bit general-purpose registers, the execution unit to: store a first structure and a second structure to a memory based on the base address, a first 64-bit data element of the first structure to include a first 64-bit data element of the first 128-bit SIMD source register, which is to include least significant bits of the first 128-bit SIMD source register, a second 64-bit data element of the first structure to include a first 64-bit data element of a second 128-bit SIMD source register, which is to include least significant bits of the second 128-bit SIMD source register, a third 64-bit data element of the first structure to include a first 64-bit data element of a third 128-bit SIMD source register, which is to include least significant bits of the third 128-bit SIMD source register, wherein the first, second, and third 64-bit data elements of the first structure are to be consecutive data elements in the memory, a first 64-bit data element of the second structure to include a second 64-bit data element of the first 128-bit SIMD source register, a second 64-bit data element of the second structure to include a second 64-bit data element of the second 128-bit SIMD source register, and a third 64-bit data element of the second structure to include a second 64-bit data element of the third 128-bit SIMD source register, wherein the first, second, and third 64-bit data elements of the second structure are to be consecutive data elements in the memory.
地址 Santa Clara CA US
您可能感兴趣的专利