The Performance Impact Of C++’s `final` Keyword For Optimization

In the world of software development the term ‘optimization’ is generally reason for experienced developers to start feeling decidedly nervous, especially when a feature is marked as an ‘easy and free optimization’. The final keyword introduced in C++11 is one of such features. It promises a way to speed up object-oriented code by omitting the vtable call indirection by marking a class or member function as – unsurprisingly – final, meaning that it cannot be inherited from or overridden. Inspired by this promise, [Benjamin Summerton] figured that he’d run a range of benchmarks to see what performance uplift he’d get on his ray tracing project.

To be as thorough as possible, the tests were run on three different systems, including 64-bit Intel and AMD systems, as well as on Apple Silicon (M1). For the compilers various versions of GCC (12.x, 13.x), as well as Clang  (15, 17) and MSVC (17) were employed, with rather interesting results for final versus no final tests. Clang was probably the biggest surprise, as with the keyword added, performance with Clang-generated code absolutely tanked. MSVC was a mixed bag, as were the GCC versions other than GCC 13.2 on AMD Ryzen, which saw a bump of a few percent faster.

Ultimately, it seems that there’s no free lunch as usual, and adding final to your code falls distinctly under ‘only use it if you know what you’re doing’. As things stand, the resulting behavior seems wildly inconsistent.

The Fastest Fourier Transform In The West

An interesting aspect of time-varying waveforms is that by using a trick called a Fourier Transform (FT), they can be represented as the sum of their underlying frequencies. This mathematical insight is extremely helpful when processing signals digitally, and allows a simpler way to implement frequency-dependent filtration in a digital system. [klafyvel] needed this capability for a project, so started researching the best method that would fit into an Arduino Uno. In an effort to understand exactly what was going on they have significantly improved on the code size, execution time and accuracy of the previous crown-wearer.

A complete real-time Fourier Transform is a resource-heavy operation that needs more than an Arduino Uno can offer, so faster approximations have been developed over the years that exchange absolute precision for speed and size. These are known as Fast Fourier Transforms (FFTs). [klafyvel] set upon diving deep into the mathematics involved, as well as some low-level programming techniques to figure out if the trade-offs offered in the existing solutions had been optimized. The results are impressive.

Fastest FFT code benchmarking results in ms
Benchmarking results showing speed of implementation versus the competition (ApproxFFT)

Not content with producing one new award-winning algorithm, what is documented on the blog is a masterclass in really understanding a problem and there are no less than four algorithms to choose from depending on how you rank the importance of execution speed, accuracy, code size or array size.

Along the way, we are treated to some great diversions into how to approximate floats by their exponents (French text), how to control, program and gather data from an Arduino using Julia, how to massively improve the speed of the code by using trigonometric identities and how to deal with overflows when the variables get too large. There is a lot to digest in here, but the explanations are very clear and peppered with code snippets to make it easier and if you have the time to read through, you’re sure to learn a lot!  The code is on GitHub here.

If you’re interested in FFTs, we’ve seen them before around these parts. Fill your boots with this link of tagged projects.

Frances Allen Optimised Your Code Without You Even Knowing

In 2020, our digital world and the software we use to create it are a towering structure, built upon countless layers of abstraction and building blocks — just think about all the translations and interactions that occur from loading a webpage. Whilst abstraction is undoubtedly a great thing, it only works if we’re building on solid ground; if the lower levels are stable and fast. What does that mean in practice? It means low-level, compiled languages, which can be heavily optimised and leveraged to make the most of computer hardware. One of the giants in this area was Frances Allen, who recently passed away in early August. Described by IBM as “a pioneer in compiler organization and optimization algorithms,” she made numerous significant contributions to the field. Continue reading “Frances Allen Optimised Your Code Without You Even Knowing”