Proving SPI transfer speeds with formal verification

A headshot of Liam McSherry

Liam

If you work with embedded devices, you'll probably be familiar with tables of ‘AC’ or ‘switching characteristics’ that tell you how fast you can talk to the device, but how can you derive those parameters in the first place?

Introduction

I ran into this problem around this time last year.

In MIRMIS, an infrared imager for remote sensing that I'm helping build, much of the command and control is handled by a microcontroller connected to an FPGA over a four-wire SPI bus. This interface as critical—as MIRMIS speeds past a comet, it will be the microcontroller's job to calculate the comet's position based on spacecraft trajectory, and it will be the FPGA's job to control pointing mirrors and track the comet. The two will need to communicate reliably to keep the mirrors on-target. I wanted to have proof for the timing parameters I would pass to colleagues working on the microcontroller.

Writing testbenches for all the timing parameters would be tedious but not too difficult. The trouble I had was that the microcontroller and FPGA in MIRMIS are asynchronous to each other, which added three complications:

  • I couldn't guarantee that changes in SCK or CS# would respect setup or hold times for the FPGA's registers, meaning metastability would be a concern
  • The classic two flip-flop metastability filter would add at least two cycles of latency between SCK and data, making it harder to reason about when data might be stable
  • It's possible that a metastable SCK or CS# would take a long time to resolve to a stable state, which could unpredictably increase the added latency beyond two cycles.

It would have been impractical to write testbenches for all these permutations in the time available. Luckily, I was already using SymbiYosys (sby) for formal verification.[1] Through sby, with a formal solver like Boolector, the constraints given in a formal proof can be used to automatically and exhaustively generate stimulus for my SPI core. As long as I could describe the phenomena above in a formal proof, then, I could get a formal solver to do the dogsbody work of injecting metastability at just the wrong moment.

Modelling metastability

I started this task with an existing formal proof for my SPI core, which I won't cover here. The upside is that the modifications I needed to make on top of this proof were trivial.

First, to represent the added latency through the synchronisers, I wrapped my existing proof in another and connected its SCK, CS#, and SDI signals through separate two-entry shift registers:

logic ff_cs_n[1:0]; logic ff_sck[1:0]; logic [3:0] ff_sd[1:0]; qspi_phy dut( .spi_cs_ni(ff_cs_n[1]), .spi_sck_i(ff_sck[1]), .spi_sd_i (ff_sd[1]), .* );

Then, for simulating metastability, the trick is in the behavioural description of the shift registers. To my wrapper I added two new top-level ports:

module fv_qspi_phy_2ff( // ... other top-level ports input cs_n_miss, input sck_miss, // ... other top-level ports );

And in the description of the shift registers, I allowed the solver to hold the values of SCK and CS# at their previous value whenever they changed, or to allow them to propagate through undelayed. This is mostly just an XOR with the values of cs_n_miss (for CS#) or sck_miss (for SCK):

always_ff @(posedge clk_i) begin ff_cs_n <= '{ 1: ff_cs_n[0], 0: spi_cs_ni ^ (!$stable(spi_cs_ni) && cs_n_miss) }; ff_sck <= '{ 1: ff_sck[0], 0: spi_sck_i ^ (!$stable(spi_sck_i) && sck_miss) }; ff_sd <= '{ 1: ff_sd[0], 0: spi_sd_i }; end

As cs_n_miss and sck_miss are entirely unconstrained, the solver is allowed to set them to whatever value it likes, whenever it likes. By holding a change in a signal off by an additional cycle, the solver can simulate a situation where metastability has taken longer than normal to resolve. I didn't use this trick with the ff_sd shift register because the sampling of the data inputs is controlled by SCK. We should be able avoid metastability on SDI by requiring a SCK period long enough that SDI will be stable when we sample it.

In addition to these changes, in my wrapper proof I had to duplicate assertions that I saw the expected data on SDO at the sampling point. The latency added by the synchronisers could allow the SPI core to output data after SCK had transitioned but before that edge had reached the core, which would cause the SPI manager controlling the bus to miss data.

Proving timing

Now that my wrapper proof models the effects of metastability and checks that the timing of data output by the SPI core is correct, how can I prove how fast I can run the SPI bus?

The answer is assumptions. If you've written testbenches (for RTL or for software), you'll have come across assertions, which are Boolean statements that must be true when they're encountered for the test to pass. When automatically generating stimulus, like with a formal solver, the cousin of the assertion is the assumption. Assumptions tell the solver what it must do, while assertions tell the checker what the test unit must do. They help constrain the state-space of a test so that the solver doesn't do something impermissible and, partly, so that the size of the state-space keeps proof-checking tractable.

For example, imagine our test unit takes any of a std_ulogic_vector(7 downto 0), a logic [7:0], or a uint8_t parameter. There are 256 possible values[2] this parameter can take. However, if we were using a one-hot encoding, we wouldn't want the solver to generate a multi-bit value and so would assume that only one bit in the parameter was ever set. The solver always respects assumptions, so it would never try a multi-bit value.

Assumptions are what we use here to figure out our SPI timings.

I described valid SPI waveforms through assumptions, and the numbers in those assumptions are the limits of the timing parameters they represent. The solver can pick any value which doesn't cross the limit, so I can know that the SPI core will work for any timings slower than those in the assumptions. The assumptions themselves are very simple and can be read almost directly as a ‘timing specifications’ table.

I've reproduced the full set of timing assumptions below:

// The PHY is unselected during reset. always @* if (~arst_ni) assume(spi_cs_ni); // The PHY remains unselected for at least the cycle after reset. assume property( @(posedge clk_i) !first_timestep_f && $rose(arst_ni) |-> spi_cs_ni ); // The PHY is only selected when SCK has been idle for at least two cycles. assume property( @(posedge clk_i) !first_timestep_f && !$past(first_timestep_f) && $fell(spi_cs_ni) |-> ~spi_cs_ni && $stable(spi_sck_i) && ~$past(spi_sck_i, 2) ); // The PHY is not deselected unless SCK has been idle for at least two cycles // and remains idle for at least the cycle after the PHY is deselected. assume property( @(posedge clk_i) !first_timestep_f && !$past(first_timestep_f, 3) && $rose(spi_cs_ni) |-> ~spi_sck_i && ~$past(spi_sck_i) && ~$past(spi_sck_i, 2) ); // And the PHY is not reselected for five cycles. assume property( @(posedge clk_i) !first_timestep_f && $rose(spi_cs_ni) |-> ##1 $stable(spi_cs_ni)[*4] ); // When the PHY is selected, SCK is held low for three subsequent cycles. assume property( @(posedge clk_i) !first_timestep_f && $fell(spi_cs_ni) |-> ##1 $stable(spi_sck_i)[*2] ); // When SCK rises, it remains risen for at least five cycles. assume property( @(posedge clk_i) !first_timestep_f && $rose(spi_sck_i) |-> ##1 $stable(spi_sck_i)[*4] ); // When SCK rises, input data has been stable for at least one cycle // before the edge and is held stable for three cycles after it. assume property( @(posedge clk_i) !first_timestep_f && $rose(spi_sck_i) |-> $stable(spi_sd_i)[*3] ); // When SCK falls, it does not rise again for at least five cycles. This // may actually require less, but this mirrors the active-time assumption. assume property( @(posedge clk_i) !first_timestep_f && $fell(spi_sck_i) |-> ##1 $stable(spi_sck_i)[*4] );

The two key assumptions here are those around the active and inactive times for SCK. I assumed that these were five cycles each, which means a full SPI bus period is at least 10 cycles. Reading that a slightly different way, the fastest this SPI core can be safely operated is 1/10th the clock speed of the logic clock for the design for a clock with 50% duty.

Of course, my proof assumes an SPI core that is sampling the bus from its logic clock. It would be possible to use the SPI clock directly to clock some sampling logic, followed by some clock domain crossing circuit to transfer the captured data into the logic clock domain. There are pros and cons to each choice. I didn't want internal clock domain crossings within the FPGA design for MIRMIS as they make verification harder and they're a potential failure point, so I chose this approach. The synchronisers on the inputs also slow down the bus—without them, I was able to pass this formal proof running the SPI bus at 1/4th the logic clock frequency for the design. That said, in an FPGA like MIRMIS' ProASIC3, driving logic from the SPI clock might be the only way to achieve very high data rates. The ProASIC3 has radiation-sensitive PLLs, no SERDES blocks, and no delay lines (outside of its clock nets), all of which make it difficult to build a conventional oversampling receiver.

I was able to run this SPI core from a 100MHz logic clock, allowing a 10MHz SCK, although we've now settled on a 50MHz logic clock and 5MHz SCK. Faster might be nicer, but for MIRMIS it's far from the most significant bottleneck.

Closing words

I've been using formal verification with MIRMIS for around 16 months now and the experience has convinced me and made me an advocate for it. I may write something about that in future. But I wrote this note because I think it's an interesting demonstration of reversing how testing is normally done—rather than start with an SPI clock rate in mind, I took an already proven design and used a formal solver to extract the maximum clock rate from it.

The SPI core tested like this is now largely feature-frozen and will hopefully fly into space onboard ESA's Comet Interceptor in 2029.