Skip to content
Technical

Automating Peripheral Map Discovery on Monolithic Firmware

We don’t always know the MCU of the firmware we receive. Knowing the MCU tells you what peripherals it has and where they live in memory. The peripherals are the entire attack surface on monolithic firmware. Can we recover the peripheral map from the firmware alone?

Malaschonok’s 2026 Ortellius paper used MMIO fingerprinting and SVD databases to deterministically find the best SVD for the firmware. SVDs, or System View Description files, describe the peripherals, registers, and bitfields of the MCU. The method returned a ranking of SVDs where the top rank identified the MCU family 57% of the time and the best available memory map 44% of the time across 42 binaries [1]. A 2024 Binary Ninja blog post mentioned using their LLM with MMIO fingerprints for peripheral identification. “While not always 100% accurate in device identification, Sidekick’s guesses were consistently close and often accompanied by convincing rationales” [2].

Diagram showing above two approaches

My hypothesis was simple: feeding both Malaschonok’s SVD ranking results and the firmware to an LLM would achieve SVD accuracy good enough for automation. In other words: slap an LLM on it.

Diagram showing hypothesis approach

Results were promising with my prototype. I wanted to make sure I was feeding the LLM the best possible information (garbage in, garbage out) and investigated whether the ranking algorithm and/or SVD database could be improved.

Malaschonok’s method for SVD scoring was to add the Jaccard similarity of MMIO reads to the Jaccard similarity of MMIO writes. I found three issues with this scoring method:

  • the Jaccard index favored SVDs with fewer registers.
  • some SVDs inaccurately labelled their access types (some STM32 clones ranked higher than ST STM32 SVDs).
  • some SVDs were more holistic than others. For example, not all Cortex-M SVDs contained entries for the private peripheral bus. Those that did ranked higher.

Scoring SVDs based on the number of matching MMIO accesses in the peripheral region fixed the above issues.

A lot of SVDs also don’t follow the official spec. Nordic, for example, strictly uses lowercase instead of camelCase for register access types. I used svd-rs with validation disabled for parsing. I also pulled SVDs from more sources, including TI’s IDE using tixml2vsd to convert the TIXML to SVDs.

For LLM augmentation, I gave medium-effort Claude Opus 4.6 a one-paragraph system prompt (shown below), the top deterministic SVD rankings, and access to bash. A lot of the Ortellius firmware included the MCU in the name of the file, so I renamed the firmware under test to firmware.bin to prevent cheating.

You are selecting the matching SVD file for a Cortex-M firmware binary. You have SVD rankings showing which peripheral maps best match the binary’s MMIO accesses. Your job is to pick the right SVD. Use tools to disambiguate when the top rank has multiple candidates or the top ranks are close.

Adding tool hints to the prompt improved results:

bash can inspect the firmware binary at $FIRMWARE (e.g. strings $FIRMWARE | grep -i , xxd $FIRMWARE | head -8).

Results with the Ortellius dataset (42 binaries):

MethodStripped BinariesExactFamilyWrong
Ortellius PaperN/A36%57%43%
Deterministic onlyN/A50%81%21%
+ LLMNo48%90%10%
+ LLM with tool hintNo67%95%5%
+ LLM with tool hintYes50%86%14%

The largest gain, +24pp in MCU family accuracy, came from the SVD ranking and database updates. The LLM added another +14pp on top of that when strings were available. The LLM improvement on stripped binaries was much smaller, only +5pp.

The LLM traces showed it was able to improve results by:

  • matching board names, chip names, SDK errors/logs, and build-paths to MCUs
  • analyzing headers, inferring the RAM size from the SP location and the IRQ count from vector-table entries

It’s not all roses and sunshine working with LLMs. Downsides are:

  • Execution time (minutes vs milliseconds)
  • Non-determinism
  • Extra complexity
  • Extra dependency
  • Hallucinations (picking nonexistent SVDs)
  • Cost (~10 cents per binary)

For now, we’re shipping the deterministic MMIO fingerprinting solution. If we encounter real-world issues, we’ll look into improving the MMIO scanning mechanism and/or bringing back the LLM.

Citations

[1] Malaschonok, D. (2026) ‘Identifying Microcontroller Architecture Through Static Analysis of Firmware Binaries’, SDIoTSec 2026. Available at: https://www.ndss-symposium.org/wp-content/uploads/sdiotsec26-7.pdf.

[2] Miller, B., Bryant, T. and Knudson, B. (2024) Sidekick in action: Analyzing firmware, Binary Ninja Blog. Available at: https://binary.ninja/2024/08/12/sidekick-in-action-analyzing-firmware.html.