Somewhere at my workplace we have this complex DSP design with many FIR filters and parallel processing taking place. It is a classic case of “historical growth” where some straight forward (non AXI) bus was implemented. This can work in single rate systems with a hardcoded sample rate and a downstream sink which eats up all the samples. However, the algorithm in question uses asynchronous processing where the input sample rate is lower or equal to the output sample rate. Inevitably there is backpressure involved. Backpressure describes a condition where a downstream IP is busy and cannot accept any more samples at its input. Any upstream master is required to pause sending data. AXIS sees the use of m_tready for this purpose.

How would you design a DSP algorithm, say an FIR filter which supports pausing at any time? And how do you deal with the other issue of samples being stuck inside the DSP chain while the upstream s_tvalid is deaserted. A straight forward and easy solution to “just make it AXIS compatible” is shown in Fig. 1.

https://mnemocron.github.io/assets/img/axis-dsp/axis-dsp-basic.png Fig 1: Basic DSP processing IP supporting minimal AXIS functionality including backpressure.

The entire DSP chain can be enabled or disabled by both tready and tvalid. This results in a correct behaviour for supporting backpressure. However, if the input stream is non continuous (e.g. only a short burst / pulse / packet) is sent to the input, the last few samples will remain stuck inside the DSP chain. One cheap workaround would be to keep s_tvalid asserted and force s_tdata to zero if the downstream systems allow for zero outputs. There is also a proper way to do it as shown in Fig. 2.

https://mnemocron.github.io/assets/img/axis-dsp/axis-dsp-extended.png Fig 2: Extended DSP processing IP which guarantees completion of calculation when upstream Master stops/pauses sending input samples.

Just like in the previous example, this approach propagates the tvalid signal along with each beat of tdata. The main difference is that tready is now used at each register stage individually to form a valid transaction together with the propagated tvalid. If the upstream master stops sending samples, this DSP block will continue to calculate the already accepted samples to send them downstream. One challenge with this design is the high fanout of tready as it is non-registered. Adding an AXIS pipeline register or a skidbuffer at the in and/or output may ease some timing tension - especially if building multiple DSP IPs with this recipe.