摘要 |
Subvector slices x(i,r,s) of a first vector x(i) are stored (e.g., in a CAM array) in a bit-parallel word-serial manner. For each of the stored subvector slices and in parallel on bits of said each subvector slice, an operation is executed that outputs a pre-calculated inner product result of the said bits and a second vector a. If the subvector slices x(i,r,s) of the first vector x(i) are initially stored in a bit-serial word-serial manner, there is a transform to store them in the bit-parallel word serial manner by copying relevant bits of each of the subvector slices from a 0th column of a content-addressable memory array to elements of a tags register and, for each kth iteration, shifting bits in the elements of the tags register by m positions and copying the shifted bits to a column of the CAM array. An associative processor outputs the pre-calculated inner product result in a distributed arithmetic manner.
|