Yes, this is another blog post on the interwebz about the infamous skid buffer or AXI pipeline stages (in particular the tready path). One could think that something so essential to building AXI compliant IP is covered in detail everywhere you search. And yet, this particular IP took me longer to develop than I would like to admit. This is the story of the AXIS Skid Buffer.

Before we dive deep into boolean logic structures, I want to mention the sources already on the internet. There is an article over at the ZipCPU blog which helped me understand what a skid buffer does - on an abstracted higher level. Unfortunately, it helped me little in understanding how to build a minimal, straight-forward skid buffer without a side quest in formal verification. Something that in hindsight would have been worth understanding - but here I am, with VHDL and no will to either circumvent the formal verification limitations of VHDL or properly learn Verilog. Another very useful article is available on Pavel Demin’s blog. This article takes a different approach and goes straight into the details of how to implement AXI compliant register stages for both the tready (“input”) and tdata/tvalid (“output”) paths. It even includes pretty schematics :D The design does however, have some flaws (or missing features).

Pavel kindly asked me what was wrong the design. Admittedly, his solution is already perfect. While “designing” my own skidbuffer (copying Pavel’s approach to VHDL) and using it in my designs I was quick to blame any bugs on the skidbuffer. I was specifically interested in the propagation of the flow control signals as notifications (see further down).

The reason why I bothered to build my own skid buffer slash AXIS pipeline stage is to have a fully registered AXI compliant template. A template that can be used as a boilerplate to build quick bus modification functions. I have used this in the past to build IP that is inserted into an AXI protocol connection to perform little adjustments to the signals. For example:

  • AXI4 to AXIS converter (write only) by dropping all address information (and emulating a BRESP response on the AXI4 interface)
  • FIR filter stage (with Xilinx FIR compiler compatible AXIS interface)
  • AXI4 virtual memory offset translator / base address register (I have not built that one yet)

It is also noteworthy, that the AXI specification requires that there are no combinatorial paths from input to output of a given IP. Adding a simple skid buffer is an easy way to achieve this.

What is a Skid Buffer? A skid buffer is used in a bus when you want to pipeline the feedback route from S to M (usually tready). The problem when introducing latency into the feedback path is, that any upstream M devices will receive the actual busy / not ready signal too late. Here, a skid buffer is introduced for the main data direction from M to S (usually tdata). The bus transactions will “skid” to a halt. Therefore the skid buffer is the shortest possible FIFO of 0 stages in normal operation or 1 stage when halting.

Minimal Skid Buffer (tready register)

The source in the Notes of Pavel Demin splits the register paths into the output buffer (tdata/tvalid) path and the input buffer (tready) path. Fig. 1 shows both registers combined. With OPT_OUT_REG=False it acts as a skidbuffer to pipeline tready. The source code for this buffer is on Github: skidbuffer.vhd.

https://mnemocron.github.io/assets/img/skidbuffer/skidbuffer-schematic.png Fig 1: Schematic of the skidbuffer / pipeline combo.

The Problem with notifications

This chapter is bonus material. I struggled to verify a few properties that I needed in my AXIS IP. These are:

  1. Rising edge on tvalid can activate downstream IP
  2. Rising edge on tready can activate upstream IP

The first property, I needed to support pipelining of a custom IP with variable sample rates. In my case it was a resampler that supports slower playback of upstream data, hence it produces more data at the AXIS output than it accepts at its input. My upstream FIFO may hold tvalid=1 data at any given time, to indicate that data is available to be processed. But my downstream IP may only be tready=1 every Nth or so clock cycle (because it is still processing old samples). In this case tvalid=1 must propagate to the downstream IP even when the downstream IP is not tready=1 yet. The same accounts for the upstream path using tready=1. Through some simulations I could verify this behaviour to a satisfying degree.