Exploring Uncut - March 3rd, 2026
Introduction
This time, the focus is solely on providing an update on my exploration of getting a Mandelbrot set renderer running on an STM32F746G-DISCO board.
I had hoped to also cover “The Egg Project” and some Home Assistant improvements I’ve been working on. However, there is already plenty to discuss here, and the other projects have been moving forward more slowly due to a collection of minor but frustrating issues, from ordering the wrong components, to chasing down unexpected sensor behavior, to tracking a Wi-Fi authentication failure caused by a simple case mismatch.
Mandelbrot update
Removing Swift Numerics
Last time, I was faced with the issue that as soon as I introduced an additional external loop around my working code, it crashed.
At this stage I was still using Swift Numerics. I didn’t think this was the culprit, but to rule it out definitively, I removed the package dependency and reimplemented to core computation using only Floats.
let maxIterations: UInt8 = 100
let cReal: Float = -1.0
let cImag: Float = 1.0
var zReal: Float = 0.0
var zImag: Float = 0.0
var iterations: UInt8 = 0
for _ in 0..<1 {
while iterations < maxIterations {
let magnitudeSquared = zReal * zReal + zImag * zImag
if magnitudeSquared >= 4.0 { // 2.0 squared = 4.0
break
}
// z = z * z + c
// (a + bi) * (a + bi) = (a² - b²) + (2ab)i
let zRealTemp = zReal * zReal - zImag * zImag + cReal
zImag = 2.0 * zReal * zImag + cImag
zReal = zRealTemp
iterations += 1
}
print("Iterations: \(iterations)\r")
}
The behavior was exactly the same. If I change the range of that for loop to have more than one iteration, the code crashes.
No progress so far.
Moving to C
To further simplify things, I implemented the main algorithm in C and called it from Swift.
uint8_t oneRound() {
uint8_t maxIterations = 100;
float cReal = -1.0;
float cImag = 1.0;
float zReal = 0.0;
float zImag = 0.0;
uint8_t iterations = 0;
while (iterations < maxIterations) {
float magnitudeSquared = zReal * zReal + zImag * zImag;
if (magnitudeSquared >= 4.0) { // 2.0 squared = 4.0
break;
}
// z = z * z + c
// (a + bi) * (a + bi) = (a² - b²) + (2ab)i
float zRealTemp = zReal * zReal - zImag * zImag + cReal;
zImag = 2.0 * zReal * zImag + cImag;
zReal = zRealTemp;
iterations++;
}
return iterations;
}
for _ in 0..<2 {
let iterations: UInt8 = oneRound()
print("Iterations: \(iterations)\r")
}
This worked correctly for one, two, twenty, or any number of iterations through the loop.
Passing coordinates
However, this isn’t very interesting, since the calculation is always performed for the same coordinate.
What we would need is to perform it for different coordinates.
So we need to pass cReal and cImag as parameters.
uint8_t oneRound(float cReal, float cImag) {
uint8_t maxIterations = 100;
float zReal = 0.0;
float zImag = 0.0;
uint8_t iterations = 0;
while (iterations < maxIterations) {
float magnitudeSquared = zReal * zReal + zImag * zImag;
if (magnitudeSquared >= 4.0) { // 2.0 squared = 4.0
break;
}
// z = z * z + c
// (a + bi) * (a + bi) = (a² - b²) + (2ab)i
float zRealTemp = zReal * zReal - zImag * zImag + cReal;
zImag = 2.0 * zReal * zImag + cImag;
zReal = zRealTemp;
iterations++;
}
return iterations;
}
But for our first test, let’s still pass the same coordinates for each call.
for _ in 0..<20 {
let iterations: UInt8 = oneRound(-1.0, 1.0)
print("Iterations: \(iterations)\r")
}
And with this, we’re back to our previous behavior: one pass through the loop is fine; more than one, and execution crashes.
Experimenting some more in this codebase, I updated the loop to what it typically is for rendering the typical Mandelbrot set.
let xMin: Float = -2.5
let xMax: Float = 1.0
let yMin: Float = -1.25
let yMax: Float = 1.25
for y: Int32 in 0..<10 {
for x: Int32 in 0..<10 {
let cReal: Float = xMin + (Float(x) / 10.0) * (xMax - xMin)
let cImag: Float = yMin + (Float(y) / 10.0) * (yMax - yMin)
let iterations: UInt8 = oneRound(cReal, cImag)
print("Iterations: \(iterations)\r")
}
}
As anticipated, this also crashes.
Computing the coordinates in C
Trying one more variation, I decided to pass x and y to the C function and let it calculate the cReal and cImag values.
uint8_t oneRound(uint32_t x, uint32_t y) {
uint8_t maxIterations = 100;
float xMin = -2.5;
float xMax = 1.0;
float yMin = -1.25;
float yMax = 1.25;
float zReal = 0.0;
float zImag = 0.0;
uint8_t iterations = 0;
float cReal = xMin + (x / 10.0) * (xMax - xMin);
float cImag = yMin + (y / 10.0) * (yMax - yMin);
while (iterations < maxIterations) {
float magnitudeSquared = zReal * zReal + zImag * zImag;
if (magnitudeSquared >= 4.0) { // 2.0 squared = 4.0
break;
}
// z = z * z + c
// (a + bi) * (a + bi) = (a² - b²) + (2ab)i
float zRealTemp = zReal * zReal - zImag * zImag + cReal;
zImag = 2.0 * zReal * zImag + cImag;
zReal = zRealTemp;
iterations++;
}
return iterations;
}
for y: UInt32 in 0..<10 {
for x: UInt32 in 0..<10 {
let iterations: UInt8 = oneRound(x, y)
print("Iterations: \(iterations)\r")
}
}
And this worked fine.
Setting some flags
Slightly running out of ideas on variations to try out, I explained all those tests to ChatGPT and asked for advice.
The fact that passing int parameters worked but not float ones led it to believe there might be an ABI mismatch issue with e.g. one language using soft-float ABI and the other using hard-float ABI.
The suggestion was to add explicit build flags to ensure everything was aligned.
I ended up adding
"-Xcc", "-mcpu=cortex-m7",
"-Xcc", "-mthumb",
"-Xcc", "-mfpu=fpv5-d16",
"-Xcc", "-mfloat-abi=hard"
to the Swift compiler options in my toolset.json file.
However, that was not enough. This only ensures that the C code compiled as part of the Swift build gets those flags applied.
I still needed to ensure that the same flags were used when compiling the standalone C source files (i.e. Support.c). This required adding a completely new section to toolset.json
"cCompiler": {
"extraCLIOptions": [
"-mcpu=cortex-m7",
"-mthumb",
"-mfpu=fpv5-d16",
"-mfloat-abi=hard"
]
},
Looking at the commands executed during the build, I could see those flags properly passed everywhere they were required (and confirmed they were not specified before).
But this did not change anything.
Single vs double precision
The astute reader might notice the use of fpv5-d16 for the FPU specification.
This indicates a double-precision-capable FPU, which I thought was the case, as glancing through the 250+ page STM32F7 Series and STM32H7 Series Cortex®-M7 processor programming manual, I had read
Floating-point unit
The FPU provides IEEE754-compliant operations on 32-bit single-precision and 64-bit double-precision floating-point values.
Unfortunately, this refers to the specifications of the Cortex-M7 processor options as used in general in different STM32 products.
Since I’m using an STM32F746G-DISCO board, I glanced at the 1700+ page of the STM32F75xxx and STM32F74xxx advanced Arm®-based 32-bit MCUs - Reference manual, without finding any additional information.
Finally, in the Datasheet - STM32F745xx STM32F746xx - ARM®-based Cortex®-M7 32b MCU+FPU, 462DMIPS, up to 1MB Flash/320+16+ 4KB, another 250+ page document, it states
The Cortex®-M7 core features a single floating point unit (SFPU) precision which supports all Arm® single-precision data-processing instructions and data types.
So I changed the fpu flag definition to fpv5-sp-d16 instead of fpv5-d16.
But it crashed just the same.
Info
Full disclaimer, ChatGPT had proposed to test using single precision and I also used it afterwards to find the exact documents where the information could be found.
I only had a vague recollection of seeing that information.
Revising the FPU enable code
The last suggestion of ChatGPT was that I was enabling the FPU too late. This seemed strange to me, as it was the first instruction called in my Swift code. Still, it proposed to move it earlier, directly in the reset handler.
So I removed the enableFPU() call that started the main() method in Application.swift and added some bits of assembly directly in the reset routine in startup.S
// ---- enable FPU here ----
ldr r1, =0xE000ED88
ldr r0, [r1]
orr r0, r0, #(0xF << 20)
str r0, [r1]
dsb
isb
// -------------------------
And this worked just fine. The loops are going over all coordinates and printing the expected iteration numbers.
I do not understand the reason why this works, which is frustrating me, but it does.
Now that I had something working, I reverted some of the other changes I made, such as adding the compiler flags, or implementing the logic in C instead of Swift, verifying at each step of the way that it was still properly working.
And eventually, I got back to using SwiftNumerics and the Complex<Float> type in my calculations and all was well.
Getting a grasp on what’s happening
As mentioned above, I don’t like the black magic feeling of changing things around and having something working without a clue as to why.
I’m strongly in the camp that there’s always a good reason for why something is working or not. It might be very complex and intricate, you might not understand it, but there is logic underneath.
At least in the context of software engineering, when it comes to philosophy, psychology or quantum physics, things are less clearly defined. This annoys and fascinates me at the same time and although this is primarily a technical blog, I may write about that topic at some point.
But I digress. To avoid that lingering frustration, and since ChatGPT had been helpful so far, I used it again to ask for an explanation.
When building the project, quite a few files are generated, including a .build/armv7em-apple-none-macho/release/Application.disassembly that contains the “readable” assembly listing of the whole program.
So I built the project twice, once where I enabled the FPU at the top of the main() function of my Swift code and once where I enabled it from the reset routine. For each, I saved the Application.disassembly file and then I asked ChatGPT to compare the two, explain the differences in the code and the crash I was observing.
It pointed out some of the obvious differences (like the one in the reset routine) but also similarities.
For both projects, the way the main() function from Application.swift was called was identical, including the fact there’s a generated function prologue that includes a vpush {d8-d13} instruction.
So I asked follow-up questions to understand exactly what was going on, like what does vpush do and what is this function prologue?
Given those answers, let me try to explain what’s happening here.
Processor or FPU registers being used in the code of a function fall in two categories:
- call-clobbered (or caller-saved) registers hold temporary information and there’s no requirement on their values after the call. If the caller wants to keep the values in such registers around, it needs to save them itself.
- callee-saved registers, on the other hand, are used to hold long-lived values that should be preserved across calls, and it is the responsibility of the callee to save those registers.
The function prologue is compiler-generated code that performs some preliminary setup before the function is called.
In our case, as FPU callee-saved registers were used in the code of our main() function, the compiler generated code to save those as part of the prologue. This is what the vpush instruction does.
Schematically, you can represent the calling sequence as
Reset → main prologue (vpush) → enableFPU() → code using FPU
Since the prologue executes before the function body, the vpush is being executed before executing the first instruction of our main() function, which is enabling the FPU. Executing an FPU instruction before the FPU is enabled led to a crash.
By enabling the FPU in the reset routine, the calling sequence looked like
Reset (enableFPU) → main prologue (vpush) → code using FPU
making sure the FPU is enabled before the vpush is executed.
But why did this crash occur only when I added an outer loop?
The way the compiler generates and optimizes the code and decides on which register to use for which operations is obviously quite complex (at least to me). But by saving the disassembly files for an outer loop with one iteration or a loop with more than one iteration, I could confirm that different registers were used and that only in the latter case were callee-saved registers involved. This meant the prologue only contained the vpush instruction for that latter case. In the first case, our calling sequence looked like Reset → enableFPU() → code using FPU
which is perfectly fine.
Using the same mechanism, I confirmed that the behavior was exactly the same between the code that was calling my C function using int parameters vs the call using float parameters.
Conclusion
In this post, I showed you how I debugged the implementation of the Mandelbrot set computation in Embedded Swift on the STM32F746G-DISCO board and how ChatGPT has helped both find and fix the issues but also understand what was really happening.
The source code for this experiment, including all the intermediate steps, is available on GitHub.
But so far, I have code that prints the number of iterations at each coordinate. The end goal is of course to draw a picture, and to do that, that iteration count needs to be mapped to a color value.
If the journey so far is any indication as what lies ahead, I’m sure I’ll be facing quite a few challenges and will have more stories to share. Stay tuned for a follow-up on this topic and on the other ongoing projects.