Most of us have a pretty simple model of how a computer works. The CPU fetches instructions and data from memory, executes them, and writes data back to memory. That model is a good enough abstraction for most of what we do, but it hasn’t really been true for a long time on anything but the simplest computers. A modern computer’s memory subsystem is much more complex and often is the key to unlocking real performance. [Pdziepak] has a great post about how to take practical advantage of modern caching to improve high-performance code.
If you go back to 1956, [Tom Kilburn’s] Atlas computer introduced virtual memory based on the work of a doctoral thesis by [Fritz-Rudolf Güntsch]. The idea is that a small amount of high-speed memory holds pieces of a larger memory device like a memory drum, tape, or disk. If a program accesses a piece of memory that is not in the high-speed memory, the system reads from the mass storage device, after possibly making room by writing some part of working memory back out to the mass storage device.
Caching takes this even further. The CPU executes code from a small but very fast cache. A larger and slower cache acts as mass storage for the fast cache. That cache may have its own cache until eventually one of the caches empties into a mass storage device. Naturally, there are some differences since the purpose is different: cache is mainly concerned with faster memory access while virtual memory tries to allow large programs to run in less physical memory.
However, this is a lot different than our common mental model. In a very real sense, today’s modern CPUs execute programs from mass storage. That’s why you can have many huge programs running on a single computer with limited memory. However, the CPU really executes from a very small high-speed memory.
A modern cache is often split into separate parts for instruction and data, and [Pdziepak] is looking specifically at the level 1 instruction cache. It gets pretty detailed, but it does talk about tools to examine cache performance and also about hot and cold functions, something we don’t think gets enough use.
Of course, if you are just writing normal code, you probably don’t care. But if you are trying to wring the most performance you can get out of your CPU, you’ll enjoy the post.