FFT: drawing the perfect circle using a bunch of random-length sticks |

I came up with the idea of expanding the “PID vs ML” article series to seriously investigate what are the real-world possibilities of augmenting classical PID controllers with self-diagnostics capabilities using FFT (unbalanced motor torque, gearbox damage, bearing faults, etc.).

Of course my nature immediately kicked in. Instead of just slapping FFT onto the PID routines, calling it a day and ticking the ‘done’ box like a normal person, I did what I always do - tore the whole thing apart into pieces and started cursing stupid imperfection of the real computer world (as opposed to that beautiful, clean mathematical fantasy we all pretend exists). We’re talking about a finite number of samples, precision butchered by IEEE 754 float/double quirks, rounding errors that sneak in like uninvited guests, and … You think I’m nitpicking again…? Yeah, probably. My nature. But hold my coffee - I’ve got a perfect real-world example coming right up that shows why sometimes being this pedantic actually saves your ass.

Finite representability of float – the infamous failure of the U.S. Patriot missile defense system

It was caused by an accumulating rounding error coming from floating-point precision limitations over long runtimes, which eventually led to a critical tracking failure. The Patriot’s weapon control computer stored time as an integer number of tenths of a second. To convert it to seconds for tracking calculations, it multiplied by 0.1 (1/10). However, 0.1 in binary is a recurring fraction (0.00011001100110011001100…₂). In the 24-bit fixed-point representation used, it was truncated to roughly 0.1 × (1 − 2⁻²⁰), introducing an initial error of about 9.5 × 10⁻⁸ seconds per conversion. Over extended operation this error accumulated and shifted the “range gate” (predicted target position):

After 8 hours: 0.0275s error → 55m shift (≈20% of range gate)
After 20 hours: 0.0687s error → 137m shift (≈50% – tracking failure threshold)
After 100+ hours (as happened in Dhahran): 0.3433s error → 687m shift (≈250% – complete miss)

The radar was looking for the Scud 687 meters away from where it actually was. An Iraqi Scud hit a U.S. Army barracks, killing and injuring soldiers.

While FFT itself doesn’t suffer from accumulating time errors like the Patriot missile fiasco, the whole discrete representation of a continuous analog signal using a finite number of crappy samples is mathematically pretty much the same as trying to draw a perfect circle using a bunch of random-length sticks you found in the garage. And then - just to make it even better - we usually approximate sin() and cos() with Taylor series truncated at some arbitrary point. What could possibly go wrong?

Quantization: turning a signal into a time series

If we want to use digital signal processing, the first step is to convert its analog form to a digital representation: a series of consecutive samples with defined values placed at equal time intervals.

This immediately raises two questions:

how fast do we collect those samples?
how many of them?

Both parameters directly influence the frequency resolution of the FFT spectrum.

Sampling frequency

The answer to the first question is the Nyquist–Shanon sampling theorem:

To faithfully reconstruct a continuous signal with maximum frequency $%f_{MAX}$% from its discrete samples, the sampling frequency $%f_S$% must satisfy: $%f_S > 2 × f_{MAX}$%

In practice: if you sample at fₛ = 1 kHz, the highest reconstructible frequency is below 500 Hz.

What happens if the condition is violated? Aliasing.

Aliasing

Let me illustrate this with an example: a 700 Hz tone sampled at 1 kHz will fold back and appear as:

$%f_{ALIAS} = | f - k * f_s |$%

where k is an integer chosen such that $%f_{ALIAS}$% falls within the interval [0, $%\dfrac{f_S}{2}$%]

If you’re trying to use specific frequency bins for diagnostics, aliasing can easily lead you to completely wrong conclusions.

How to avoid aliasing?

Anti-aliasing filters (low-pass analog filters) filtering ADC input. The simplest one is an RC filter - costs nearly nothing, can save your ass.

Oversampling well above the Nyquist rate $%(f_S \gg 2*f_{MAX})$%. Find a decent margin for Your project.

Combine both methods if you like.

Number of samples

The frequency resolution of the FFT spectrum depends on the number of collected samples:

$%Δf = f_S / N$%

where:

Δf - resolution
$%f_S$% - sampling frequency
N - number of samples

For example:

fs[Hz]	N	Δf [Hz]
1000	256	3.91
1000	1024	0.98
1000	4096	0.24

The more samples you collect: the more time it takes - but the better resolution You get. That’s the trade-off between responsiveness and accuracy.

Another issue: signal clipping

The sampled values must correspond to the actual signal shape. If the input voltage exceeds the ADC’s measurement range, the clipping occurs.

A pure sine wave is an odd function:

$%y(t) = A · sin(ωt)$%

because:

$%y(−t) = −y(t)$%

But cosine - on contrary is the even one.

The spectrum when input signal starts clipping — FFT spectrum for a clipped signal of 100/250Hz

Therefore, if the signal is being clipped symmetrically, you will get odd harmonic frequency peaks. The more asymmetrical the clipping - the more even harmonics appear.

Those artificial harmonics - which don’t exist in the real signal - spread across the entire spectrum, making it practically useless for diagnostics. Always make sure the signal amplitude is normalized.

TL;DR Can we finally write some code?

DFT/FFT assumes that your sample buffer of length N is exactly one complete periodic copy of the signal.

In practice we have a finite buffer (e.g. 1024 samples) collected over a finite time.

If the signal nodes happen to lie exactly at the beginning and end of the buffer - then great, we can compute the spectrum right away. But unless you’re working on some very special case (signal perfectly matched to window length), forget about it: it’s not gonna happen.

Almost always the signal is truncated at the edges of a sampling window. Therefore at its ends you get a Dirac delta at the edges, which is represented as a superposition of sines of a different frequencies resulting in spectral leakage across the FFT spectrum.

Windows - solution to spectral leakage

To suppress those “energy shots” across the whole spectrum, we apply so-called window functions that smoothly taper the signal to zero at both ends of the buffer while preserving full amplitude in the middle.

This dramatically reduces leakage of non-existent frequencies. Let me show that in pictures below. The charts are intended to illustrate the shape of attenuation over the sample buffer, while the OLED display photos represents the logarytmic scaled representation sample sets generated in a following manner:

void fft_fake_samples(void)
{

    static float phase = 0.0f;

    phase += 0.05f;
    phase = fmodf(phase, 2.0f * M_PI); 

    for (int i = 0; i < FFT_SIZE; i++)
    {
        float t = (float)i / FFT_SIZE;

        float sig = 0.0f;
        //100Hz
        sig += 0.4f * sinf(2.0f * M_PI * 100.0f * t + phase * 1.0f);
        //250Hz
        sig += 0.4f * sinf(2.0f * M_PI * 250.0f * t + phase * 1.5f);

        fft_add_sample(sig);
    }  
}

Signal without window being applied

No window is being applied on signal, note buffer truncating it on the right side — Sample buffer without applied window

At the screen above you may clearly notice the sample buffer ends when signal is not crossing zero. This will result in a spectral leakage. Here it is

There are two visible peaks that represents real frequencies, but also some fale peaks and a lot of grass noise — Two sine waves 100+250Hz with no window being applied. Two visible peaks leaking spectrum.

Hanning window

void fft_init_hanning(void)
{
    for (uint32_t i = 0; i < FFT_SIZE; i++)
        {
        signal_window[i] = 0.5f * (1.0f - cosf(2.0f * M_PI * i / (FFT_SIZE - 1)));
        }
}

Out of a same signal we are getting a very improved representation — Two sine waves 100+250Hz with Hanning window being applied.

Hamming window

void fft_init_hamming(void)
{   
    const float alpha = 0.54f;
    const float beta  = 0.46f;   // 1 - alpha

    for (uint32_t i = 0; i < FFT_SIZE; i++) {
        float t = (float)i / (float)(FFT_SIZE - 1);
        signal_window[i] = alpha - beta * cosf(2.0f * M_PI * t);
    }
}

From the same signal, we're getting a much improved representation — Two sine waves 100+250Hz with Hamming window being applied. Not much of a change when compared to Hanning, but yet the attenuation curve doesn’t differ much tho.

Blackman - Harris 4-term window

void fft_init_blackman(void)
{
    const float a0 = 0.35875f;
    const float a1 = 0.48829f;
    const float a2 = 0.14128f;
    const float a3 = 0.01168f;
    const float a4 = 0.00002f;

    for (uint32_t i = 0; i < FFT_SIZE; i++)
    {
        float t = (float)i / (float)(FFT_SIZE - 1);
        signal_window[i] =
            a0
          - a1 * cosf(2.0f * M_PI * t)
          + a2 * cosf(4.0f * M_PI * t)
          - a3 * cosf(6.0f * M_PI * t)
          + a4 * cosf(8.0f * M_PI * t);
    }
}

Signal with Blackman - Harrins 4-term window — Sample buffer with Blackman-Harris 4-term window applied

As far as we’re dealing with attenuation curve, each one of them spoils the real amplitude of the frequencies, so we have to gain them once computed. Here’s the table:

Window type	Max side-lobe level [dB]	Coherent gain
No window	-13	1.00
Hanning	-32	0.50
Hamming	-43	0.54
Blackman-Harris 4-term	-92	0.358

Coherent gain tells how much the window attenuates the overall signal energy. Remember to multiply the calculated spectrum by $%\dfrac{1}{Coherent\ gain}$%

Max side-lobe level is the strength of the worst side lobe relative to the main lobe (the lower dB the better). Amplitude correction after computing arm_cmplx_mag_f32 you usually multiply by 1/coherent_gain.

If you want to dig more in subject of windows - search for “On the use of windows for harmonic analysis with the Discrete Fourier Transform” by Frederic Harris (1978).

Rectangular with -13 dB is straight-up tragic - basically the same as no window at all. You only reach for it when your signal is perfectly periodic inside the buffer… which - let’s be honest - happens almost never in real life.

Hanning and Hamming are giving comparable results attenuating signal energy by around a half and leaving reasonable side lobes.

Blackman-Harris with its -92 dB looks sexy, but you pay for it with a noticeably wider main lobe. Worth only when you really need to get rid of side lobes and don’t mind sacrificing some bin sharpness.

Scalloping loss

We have 1024 FFT bins, but when visualizing (e.g. on a small display) we often show far fewer bars: we average / bin-down the spectrum.

If a true frequency peak falls exactly between two display bins - its energy splits between them - the peak looks smaller than it really is (undoing some of the good work done by applying the window to keep side lobes small).

Conclusion: if you want to detect characteristic fault frequencies reliably - always work with the raw FFT spectrum, not with a down-sampled / binned visualization.

The problematic CMSIS implementation

In my code I used the CMSIS-DSP library, which requires two conditions to work correctly:

data must be placed in CCMRAM

__attribute__((section(".ccmram")))

data must be 8-byte aligned

__attribute__((aligned(8)))

In theory that should be enough. In practice - the STM32 X-CUBE-ALGOBUILD (ver 1.4) I was using - the function arm_rfft_fast_init_f32 allocates internal twiddle-factor tables without respecting the alignment inheritance despite attributes specified. Result? Toss the coin whether the code is going to work or you’ll get hardware FPU fault exception.

Verify variable addresses (including nested ones) in the debugger “Watch” window by typing &variable.

Oh - and one more odd behavior. I had a loop generating two sine waves with slowly increasing phase offset (so the spectrum wouldn’t be static at the display). After quite a time I’ve noticed a strange extra lines appearing in the spectrum. The longer it ran, the worse it got.

Turned out that the higher the argument to sinf(), the less accurate the result becomes. Since phase kept growing so I had to reduce it periodically:

phase = fmodf(phase, 2.0f * (float)M_PI);

Float or double?

Going back to the IEEE 754 float limitations mentioned at the beginning - the representation error introduces unwanted data noise during discretization that transfers to the spectrum.

For very weak fault signals this noise floor from float might ruin the result, even though float is much faster thanks to the Cortex M4 FPU. So I wondered - maybe switch to double for higher precision? What are the numbers we’re talking about?

Timing results (Cortex-M4 @ 168 MHz):

N	Algorithm	Cycles	Computing time [ms]
1024	CMSIS-DSP	225032	1.34
1024	FFT float	295689	1.76
1024	FFT double	4565004	27.17
1024	DFT double (classic)	3232385299	19240

As a bonus, I’ve wanted to get the real numbers to find out how big improvement was the Cooley-Tukey implementation: 19240ms down to 27ms. Quite impressive, huh? Over 700 times faster.

Aftermath

Now you probably understand why I couldn’t just slap an FFT block onto a PID controller and call it “done”. It turns out that implementing something as seemingly simple as FFT analysis can open up so many interesting side topics that it deserves its own chapter. Admittedly, the experimentation was a lot of fun - even though I wasted quite a bit of time debugging weird CMSIS corner cases.

Finite representability of float – the infamous failure of the U.S. Patriot missile defense system#

Quantization: turning a signal into a time series#

Sampling frequency#

Aliasing#

How to avoid aliasing?#

Number of samples#

Another issue: signal clipping#

TL;DR Can we finally write some code?#

Windows - solution to spectral leakage#

Signal without window being applied#

Hanning window#

Hamming window#

Blackman - Harris 4-term window#

Scalloping loss#

The problematic CMSIS implementation#

Float or double?#

Aftermath#