摘要 |
A method for implementing an IIR filter (100) on a parallel processing hardware platform (200) such as a GPU (340), comprises separating the IIR filter into a sequence of biquad filters (110), implementing each biquad filter as a separate thread (120) with one or more processing elements (130) each, assigning to the first thread a first output memory block (140), and to the last thread a first input memory block (150), assigning to each of the remaining threads an input memory block (150) and an output memory block (140), executing each of the threads with a first block of data (160) to be processed, and when all threads (120) are finished, assigning to the first thread a different second output memory block and to the last thread a different second input memory block, assigning to each of the remaining threads a different input memory block and a different output memory block than before, such that each output memory block of a thread becomes a new input memory block of the next thread, and each input memory block of a thread becomes a new output memory block of the previous thread, and executing each of the threads with a second block of data (160) to be processed. The above steps are repeated for all blocks of the data to be processed. |