A handful of lines of source code can turn a bare‑metal STM32H7 into a tiny, deterministic platform that runs, and even trains, machine‑learning models on the edge. Vulcan OS is that platform – a self‑contained, real‑time operating system written in C that ships with its own peripheral drivers, memory manager, and a custom tensor arena, all crafted for the STM32H743ZI’s Cortex‑M7 core.
Rather than rely on the heavyweight abstractions of FreeRTOS or the STM32 HAL, the authors of Vulcan OS opted to hand‑roll every subsystem. The result is an OS that knows precisely how to use the 480 MHz processor, the 128 KB of on‑chip DTCM, and the 2 MB of flash. Its stated goal is to enable on‑device inference and, in future releases, on‑device training without pulling in external libraries such as TensorFlow Lite.
Key capabilities
- Preemptive RTOS core
Cortex‑M7 PendSV context switching, SysTick‑driven scheduler, and binary mutexes allow deterministic task prioritisation. - Custom memory management
A pool allocator feeds a fixed‑size tensor arena that lives in DTCM, ensuring memory is never fragmented during operations. - Minimal peripheral stack
Drivers for GPIO, UART, DMA, ADC, SPI are written in plain C, exposing clean public headers (vulcan_gpio.h,vulcan_uart.h, etc.). - Intended for ML workloads
The architecture already supports an upcoming INT8 matrix multiplication operator and a future conv2d toolbox, preparing the platform for efficient inference. - Bare‑metal boot
The HAL implements the vector table and a 480 MHz PLL configuration, so the firmware can be built with a simplecmakecommand and flashed withopenocd.
Under the hood
Vulcan OS is split into three logical layers that mirror its build structure:
| Layer | Purpose | Example Files |
|---|---|---|
| Kernel | Scheduler, synchronization, allocator | vulcan_sched.c, vulcan_mem.c |
| HAL | Direct register access, system clock, peripheral configuration | vulcan_clock.c, vulcan_dma.c |
| Application | Demo tasks (sensor, inference, report) | main.c |
The HAL eliminates the need for the STM32 HAL and FreeRTOS‑specific code. Every peripheral operation is interrupt‑driven, except for a polling‑based fallback in the SPI master logic. The ADC drivers use DMA in continuous circular mode, pushing data straight into the tensor arena. UART output is a lightweight ring buffer, wrapped by a VK_LOG macro that prefixes console messages with the OS version and tick count.
The compiler toolchain is the standard arm-none-eabi-gcc, and the project ships with a dedicated CMake toolchain file (cmake/arm-none-eabi.cmake). The default target board is the Nucleo‑H743ZI2, but the pin configuration is abstracted, allowing a quick port to other H7 boards.
Who it fits / Who it doesn’t
Fits
- Developers who need a tiny, deterministic runtime on the H7 family and want full control over memory usage.
- Teams targeting on‑device inference in a highly resource‑constrained setting, where external libraries would bloat code and increase latency.
- Hobbyists or academic projects that prefer C‑level visibility of the RTOS and peripheral interactions.
Doesn’t fit
- Those who require a mature, battle‑tested RTOS ecosystem (FreeRTOS, Zephyr).
- Projects that already rely on STM32 HAL or wish to use higher‑level drivers (e.g., CMSIS‑Device).
- Use cases that need a fast, ready‑made ML inference engine; the current implementation is a scaffold intended for future development of operators.
Because the OS is built from scratch, it offers the lowest possible memory footprint, but at the cost of development overhead. A user must be comfortable contributing kernel‑level code and integrating custom operators once Phase 3 is released.
Setup, briefly
Getting the firmware onto a Nucleo‑H743ZI2 is straightforward: install gcc-arm-none-eabi, cmake, and openocd, then follow the cmake‑based build commands in the repository README. The README also details how to flash the binary with OpenOCD and how to observe debug output on the virtual COM port.
In the ecosystem of embedded ML, Vulcan OS occupies a niche between fully pre‑built inference libraries and generic RTOSes. Compared to TensorFlow Lite Micro, which still depends on the STM32 HAL and a small RTOS wrapper, Vulcan OS removes those layers entirely, trading portability for a leaner binary. It may appeal to the subset of developers who demand every byte counted and are ready to invest in a custom runtime.
The source is on GitHub.
Comments