摘要 |
PURPOSE:To improve a price-performance ratio by providing an auxiliary shared memory which has banks in the number larger than the number of processors per cluster together with shallow interleaving and a bus mechanism which can write data in parallel in both the main and auxiliary shared memories and also read out these data independently of each other. CONSTITUTION:LU decomposition is carried out for an irregular sparse matrix where only a non-zero request is stored in a main shared memory MM. For this purpose, a 1st instruction secures the connection between a bypass switch BS and the auxiliary shared memory SM for data loading. The result of division performed by an arithmetic unit PU is stored. A 2nd instruction disconnects a BS to load data and stores the result of product sum arithmetic in the memory MM after data loaded for updating of the value. Then a vector length expansion algorithm is applied to start two types of pipeline arithmetic operations for each of row vector arithmetic groups that can be carried out in parallel with each other. Thus it is possible to perform the LU decomposition of an irregular sparse matrix with small starting frequencies. |