发明名称 Vector address conflict resolution with vector population count functionality
摘要 Instructions and logic provide SIMD address conflict resolution with vector population count functionality. Some embodiments include processors with a register with a variable plurality of data fields, each of the data fields to store a variable second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of bits set to one for corresponding data fields. Responsive to decoding a vector population count instruction, execution units count the number of bits set to one for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector population count instructions can be used with variable sized elements and conflict masks to generate iteration counts and completion masks to be used each iteration to resolve dependencies in gather-modify-scatter SIMD operations.
申请公布号 US9411592(B2) 申请公布日期 2016.08.09
申请号 US201213731005 申请日期 2012.12.29
申请人 Intel Corporation 发明人 Valentine Robert;Charney Mark J.;Corbal Jesus;Girkar Milind B.;Hughes Christopher J.;Ould-Ahmed-Vall Elmoustapha;Toll Brett L.
分类号 G06F9/30;G06F9/38;G06F7/60;H03M7/20 主分类号 G06F9/30
代理机构 Lowenstein Sandler LLP 代理人 Lowenstein Sandler LLP
主权项 1. A processor comprising: a first register comprising a first plurality of data fields, wherein each of the first plurality of data fields is to store a plurality of bits; a first destination register comprising a second plurality of data fields corresponding to the first plurality of data fields, wherein each of the second plurality of data fields is to store a count of a number of bits set to one for a corresponding data field of the first plurality of data fields; a second register comprising a third plurality of data fields corresponding to the second plurality of data fields, wherein each of the third plurality of data fields is to store a copy of a specific value; a second destination register comprising a plurality of mask fields, a portion of the plurality of mask fields corresponding to the second plurality of data fields; a decode stage to decode one or more instructions; one or more execution units, responsive to the decoded one or more instructions, to: read the plurality of bits of each of the first plurality of data fields; andfor each data field of the first plurality of data fields in the first register, count the number of bits set to one and store the count as a value in a corresponding data field of the second plurality of data fields;compare the value of each of the second plurality of data fields with a corresponding copy of a specific value of each of the third plurality of data fields to generate a corresponding mask value; andstore each of the corresponding mask values in a corresponding mask field in the portion of the plurality of mask fields.
地址 Santa Clara CA US