Assessing The Energy Efficiency Of Programming Languages

Programming languages are generally defined as a more human-friendly way to program computers than using raw machine code. Within the realm of these languages there is a wide range of how close the programmer is allowed to get to the bare metal, which ultimately can affect the performance and efficiency of the application. One metric that has become more important over the years is that of energy efficiency, as datacenters keep growing along with their power demand. If picking one programming language over another saves even 1% of a datacenter’s electricity consumption, this could prove to be highly beneficial, assuming it weighs up against all other factors one would consider.

There have been some attempts over the years to put a number on the energy efficiency of specific programming languages, with a paper by Rui Pereira et al. from 2021 (preprint PDF) as published in Science of Computer Programming covering the running a couple of small benchmarks, measuring system power consumption and drawing conclusions based on this. When Hackaday covered the 2017 paper at the time, it was with the expected claim that C is the most efficient programming language, while of course scripting languages like JavaScript, Python and Lua trailed far behind.

With C being effectively high-level assembly code this is probably no surprise, but languages such as C++ and Ada should see no severe performance penalty over C due to their design, which is the part where this particular study begins to fall apart. So what is the truth and can we even capture ‘efficiency’ in a simple ranking?

Continue reading “Assessing The Energy Efficiency Of Programming Languages”

AVX-512: When The Bits Really Count

For the majority of workloads, fiddling with assembly instructions isn’t worth it. The added complexity and code obfuscation generally outweigh the relatively modest gains. Mainly because compilers have become quite fantastic at generation code and because processors are just so much faster, it is hard to get a meaningful speedup by tweaking a small section of code. That changes when you introduce SIMD instructions and need to decode lots of bitsets fast. Intel’s fancy AVX-512 SIMD instructions can offer some meaningful performance gains with relatively low custom assembly.

Like many software engineers, [Daniel Lemire] had many bitsets (a range of ints/enums encoded into a binary number, each bit corresponding to a different integer or enum). Rather than checking if just a specific flag is present (a bitwise and), [Daniel] wanted to know all the flags in a given bitset. The easiest way would be to iterate through all of them like so:

while (word != 0) {
  result[i] = trailingzeroes(word);
  word = word & (word - 1);
  i++;
}

The naive version of this look is very likely to have a branch misprediction, and either you or the compiler would speed it up by unrolling the loop. However, the AVX-512 instruction set on the latest Intel processors has some handy instructions just for this kind of thing. The instruction is vpcompressd and Intel provides a handy and memorable C/C++ function called _mm512_mask_compressstoreu_epi32.

The function generates an array of integers and you can use the infamous popcnt instruction to get the number of ones. Some early benchmark testing shows the AVX-512 version uses 45% fewer cycles. You might be wondering, doesn’t the processor downclock when wide 512-bite registers are used? Yes. But even with the downclocking, the SIMD version is still 33% faster. The code is up on Github if you want to try it yourself.

Print-in-Place Engine Aims To Be The Next Benchy

While there are many in the 3D-printing community who loudly and proudly proclaim never to have stooped to printing a 3DBenchy, there are far more who have turned a new printer loose on the venerable test model, just to see what it can do. But Benchy is getting a little long in the tooth, and with 3D-printers getting better and better, perhaps a better benchmarking model is in order.

Knocking Benchy off its perch is the idea behind this print-in-place engine benchmark, at least according to [SunShine]. And we have to say that he’s come up with an impressive model. It’s a cutaway of a three-cylinder reciprocating engine, complete with crankshaft, connecting rods, pistons, and engine block. It’s designed to print all in one go, with only a little cleanup needed after printing before the model is ready to go. The print-in-place aspect seems to be the main test of a printer — if you can get this engine to actually spin, you’re probably set up pretty well. [SunShine] shares a few tips to get your printer dialed in, and shows a few examples of what can happen when things go wrong. In addition to the complexities of the print-in-place mechanism, the model has a few Easter eggs to really challenge your printer, like the tiny oil channel running the length of the crankshaft.

Whether this model supplants Benchy is up for debate, but even if it doesn’t, it’s still a cool design that would be fun to play with. Either way, as [SunShine] points out, you’ll need a really flat bed to print this one; luckily, he recently came up with a compliant mechanism dial indicator to help with that job.

Continue reading “Print-in-Place Engine Aims To Be The Next Benchy”