How to Implement a Circular Buffer Log System on STM32 SPI Flash with DMA

Circular Buffer Log System on SPI Flash with STM32 DMA

Logging data to external flash is one of those things that sounds straightforward until you actually do it. Flash memory has erase-before-write constraints, limited erase cycles, and page alignment requirements that trip people up constantly. A circular buffer solves the biggest headache: you get a fixed-size log that automatically wraps around and overwrites the oldest entries when it fills up. Pair that with DMA-based SPI transfers and your CPU barely has to think about the logging process at all.

This guide walks through building a circular buffer log system on an STM32 using SPI flash (like the Winbond W25Q series) with DMA transfers and proper error handling.

Prerequisites

Working knowledge of STM32 peripheral configuration (GPIO, SPI, DMA)
Familiarity with SPI communication basics (clock polarity, phase, chip select)
Understanding of how DMA channels work on STM32
STM32CubeIDE v1.16+ installed and configured
An SPI flash module on hand (W25Q32, W25Q64, W25Q128, etc.)

Parts/Tools

STM32 microcontroller board (STM32F4 series works well, but any STM32 with SPI + DMA will do)
SPI flash memory (Winbond W25Q series is the go-to)
STM32CubeIDE v1.16+ or your preferred STM32 toolchain
Jumper wires and breadboard for prototyping connections
Stable 3.3V power supply for both the STM32 and flash chip

Steps

Set Up Your Development Environment
- Open STM32CubeIDE and create a new STM32 project targeting your specific MCU.
- In the CubeMX pinout view, enable an SPI peripheral (SPI1 is typical). Set it to Full-Duplex Master mode.
- Enable DMA for both SPI TX and RX channels. Set the DMA direction to Memory-to-Peripheral for TX and Peripheral-to-Memory for RX. Use byte-width transfers unless you have a specific reason for half-word or word.
- Watch out: make sure the SPI clock prescaler gives you a frequency the flash chip supports. Most W25Q parts handle up to 80 MHz, but start at something conservative like 10-20 MHz until everything works.
Configure SPI Flash Memory
- Wire up the SPI flash to your STM32. The pin mapping is straightforward:
```
STM32 Pin         Flash Pin
-------------------------
SPI_MOSI  ->  DI (DQ0)
SPI_MISO  ->  DO (DQ1)
SPI_SCK   ->  CLK
GPIO (CS) ->  /CS
```
- A tip here: use a dedicated GPIO for chip select rather than the hardware NSS pin. Hardware NSS on STM32 is finicky with multi-byte transfers. Configure it as a GPIO output, active low, and toggle it manually in your code.
- Initialize the SPI peripheral:
```
HAL_SPI_Init(&hspi1);
```
- Verify communication by reading the flash chip's JEDEC ID. If you don't get the expected manufacturer/device bytes back, your SPI mode or wiring is wrong.

Implement the Circular Buffer

Define your buffer structure. This sits in RAM and acts as the staging area before data gets flushed to flash:

#define BUFFER_SIZE 1024

typedef struct {
    uint8_t data[BUFFER_SIZE];
    volatile int head;
    volatile int tail;
    volatile int count;
} CircularBuffer;

CircularBuffer logBuffer = { .head = 0, .tail = 0, .count = 0 };

The write function pushes bytes into the RAM buffer. When the buffer is full, it overwrites the oldest data by advancing the tail:

void writeBuffer(CircularBuffer *buf, uint8_t byte) {
    buf->data[buf->head] = byte;
    buf->head = (buf->head + 1) % BUFFER_SIZE;
    if (buf->count == BUFFER_SIZE) {
        // Buffer full - oldest data gets overwritten
        buf->tail = (buf->tail + 1) % BUFFER_SIZE;
    } else {
        buf->count++;
    }
}

Keep the RAM buffer and the flash-side circular log as separate concepts. The RAM buffer stages data; you flush it to flash periodically or when it hits a threshold. This decouples your log writes from the relatively slow flash erase/program cycle.

Set Up DMA for SPI Transmission
- With DMA configured in CubeMX, the HAL gives you non-blocking SPI functions. Use HAL_SPI_Transmit_DMA to send your buffer contents to the flash chip without stalling the CPU:
```
// Pull CS low before transfer
HAL_GPIO_WritePin(FLASH_CS_GPIO_Port, FLASH_CS_Pin, GPIO_PIN_RESET);

// Send page program command + address first, then data
HAL_SPI_Transmit_DMA(&hspi1, txData, dataLength);
```
- Remember that SPI flash writes require a Write Enable command (0x06) before every Page Program (0x02). You also need to wait for the flash's internal write cycle to complete by polling the status register's BUSY bit. Skipping either of these is the number one reason flash writes silently fail.
- A practical pattern: send the command+address bytes using blocking HAL_SPI_Transmit (they're only 4 bytes), then switch to DMA for the actual data payload. This keeps the code simpler and the DMA overhead is only worth it for the bulk data.

Implement Error Handling

void HAL_SPI_ErrorCallback(SPI_HandleTypeDef *hspi) {
    uint32_t error = HAL_SPI_GetError(hspi);
    if (error & HAL_SPI_ERROR_DMA) {
        // DMA transfer error - retry or log
    }
    if (error & HAL_SPI_ERROR_OVR) {
        // Overrun - data lost, flush RX
    }
    // Always release CS on error
    HAL_GPIO_WritePin(FLASH_CS_GPIO_Port, FLASH_CS_Pin, GPIO_PIN_SET);
}

Also implement the transfer complete callback to release CS and update your buffer state:

void HAL_SPI_TxCpltCallback(SPI_HandleTypeDef *hspi) {
    HAL_GPIO_WritePin(FLASH_CS_GPIO_Port, FLASH_CS_Pin, GPIO_PIN_SET);
    // Update tail pointer, mark flush complete
}

On the flash side, add a simple CRC or checksum to each log entry. When you read entries back, verify the checksum. Flash bits can flip, especially as the chip ages, and you want to know when that happens rather than trusting corrupted data.

Compile and Flash the Code
- Build the project in STM32CubeIDE. Fix any warnings - treat them seriously, especially around pointer types and DMA buffer alignment.
- Connect your STM32 board via ST-Link and flash the firmware.
- Watch out: if your DMA buffers are in a cached memory region (common on STM32F7/H7), you need to handle cache coherency. Either place DMA buffers in a non-cacheable region using the linker script or use cache maintenance functions before and after DMA transfers.
Test the Circular Buffer Log System
- Write a test that fills the circular buffer past its capacity, flushes to flash, then reads back and verifies the data. This confirms your wrap-around logic works.
- Read back raw flash sectors with a debug tool or UART dump to verify the data layout matches your expectations.
- Test power loss scenarios. Pull power during a write and verify on the next boot that your log metadata (head/tail pointers stored in flash) is still consistent. This is where most circular buffer implementations break down in production.

Troubleshooting

Flash Memory Not Responding
- Double-check your wiring. MOSI and MISO get swapped more often than anyone admits.
- Verify the CS pin toggles correctly with a logic analyzer or scope. The flash chip ignores everything when CS is high.
- Confirm SPI mode. Most W25Q chips use SPI Mode 0 (CPOL=0, CPHA=0). If you're in Mode 3 it might also work, but modes 1 and 2 won't.
- Check the flash chip's supply voltage. Some W25Q variants need 2.7V minimum; a sagging 3.3V rail under load can cause intermittent failures.
Data Corruption
- Make sure you're erasing the flash sector (4KB minimum on W25Q) before writing to it. Flash memory can only flip bits from 1 to 0 during programming. To write new data, you must erase first (sets all bits to 1).
- Add CRC-16 or CRC-32 to each log entry. Verify on read. If CRCs fail consistently in certain sectors, those sectors may be wearing out.
- If you're writing from an ISR context, ensure your buffer access is atomic or properly guarded. The volatile keyword alone is not enough - you may need critical sections around multi-step buffer operations.

What You've Built

You now have a circular buffer logging system that stages data in RAM and flushes it to SPI flash using DMA, keeping CPU overhead minimal. The circular design means your log never runs out of space - it just recycles. For production use, I'd recommend adding a sequence number to each entry so you can reconstruct the correct order on readback, and storing your head/tail metadata in a dedicated flash sector with redundant copies so you survive unexpected power loss gracefully.