Raspberry Pi 4 Benchmarks: 32- Vs 64-bits

[Matteo] bought a new Raspberry Pi 4. Why not? You get a quad-core ARM processor, up to 4 GB of RAM, and a gigabit Ethernet port for $35 $35-55. However, the default operating system is still a 32-bit system and doesn’t take advantage of the Pi 4’s 64-bit capable CPU. So he installed a light version of 64-bit Debian and ran some benchmarks for the Raspberry Pi 4 running both 32-bit and 64-bit operating systems.

It really shouldn’t be surprising that the 64-bit OS did better in nearly every test. If anything is surprising, it may be that the difference is so pronounced. Some of the benchmarks, like Dhrystones, probably don’t relate much to real-life usage. But some things, like computing a hash, is something you probably do pretty often in normal usage, and the timing difference is pronounced.

A few things were limited by things other than the CPU. RAM speed was a little better, but not much. Dropping firewall packets was another big difference. The 32-bit system could drop 268 packets per second, while the 64-bit dropped 557. VPN is another case where other things limited performance so the difference between the operating system size didn’t matter much.

Benchmarks are always tricky, so your mileage — especially your real-life mileage — may vary. However, it does seem like there are some real advantages to dumping the 32-bit operating system.

If you are interested in performance versus the Pi 3, we looked at that earlier. Spoiler alert: it is much better. Or you can go even further back if you like.

57 thoughts on “Raspberry Pi 4 Benchmarks: 32- Vs 64-bits

  1. In most arithmetic operations 32 bits should be enough, but in Boolean operations on blocks of data like in encryption, more bits translates directly to an increase of performance. If your use case includes lots of services like SSH, an increase in bits could be a good choice.

    1. There’s a difference in the number of registers too – ARM32 gives you 16 registers, but ARM64 gives you 32 registers which can help to reduce the number of memory accesses, which will speed things up. In the same way, the calling standard for ARM32 gives you 4 registers to pass arguments into subroutines (the rest go on the stack), but ARM64 gives you 8 registers. Between these two, I suspect this is probably most of the performance improvement, rather than directly from the width of the data being worked on.

      1. This. I once moved an application from “i686” to “pentium4” in the compiler flags. This reduced the runtime by 40%. Most of this was due to more registers including 64bit registers, and SSE functions. And this was still 32bit vs 32bit OS.

        In the end, benchmarking is hard, and millage for your specific application will vary. I knew this compiler flag would be a great win, as there was a lot of 64bit math in the application. So having direct 64bit registers and instructions to work on those provided a lot of speed.

    2. Both ARM and Intel (or really, AMD, since Intel abandoned their own 64bit attempt and licensed AMD’s) took the opportunity to use the 64bit conversion to fix some long standing issues. A bunch of stuff was going to break for backwards compatibility when in 64bit mode, so might as well go all in. Those new instructions and expanded registers are where most of the benefit comes from.

      Of course, the two made different decisions in the process, so they aren’t directly comparable.

    1. I was confused about that. 500 packets per second, with a 1500 byte frame being the max (too lazy to shave off the ethernet header for this back of the envelope calculation) is about 6Mbps. Nobody would be using that as a firewall, even if you’re really cheap.

  2. “Matteo] bought a new Raspberry Pi 4. Why not? You get a quad-core ARM processor, up to 4 GB of RAM, and a gigabit ethernet port for $35.”

    No, you can’t – the 4 Gig Pi 4 is not $35. It is a good bargain, but it is not $35. The 4 GB of RAM model is $55, the 1 GB model is $35, approx. the same price as the 1 GB model 3 Raspberry Pi.

      1. You understand the Raspberry Pi comes from Europe, right?

        The PiHut shows it as £54, which I suspect is inclusive of VAT, since the listing says including taxes.

        In the colonies the board is $55 w/o VAT, since we don’t have a VAT in the USA.

        What you see as a lazy conversion rate seems more like a coincidence to me.

          1. That’s completely irrelevant, because people are seeing 55 USD vs 54 EUR and think there was an extremely lazy exchange rate used, as per the comment a couple levels above. Instead, it’s probably a coincidence of Europe’s VAT being nearly the same percentage as the current difference in dollar value. The USD is worth less, but doesn’t need to charge a VAT, so the price is almost same.

            I have no idea if that’s true, but the point is the country of origin for the Pi has zero bearing on a European VAT equalizing the price in Euros to the price in US Dollars.

    1. It really does depend on what applications you are running. If you are running legacy 32-bit binaries on a 64-bit OS, then in theory you could end up with less performance. And if the code was explicitly designed and optimised to squeeze the maximum performance out of using 32-bit words by hand coded bit twiddling, recompiling it for 64-bit could end up with half the performance because you could end up reading twice the memory and only performing operations on half. But 64-bit is required for maximum performance if you want to access more than 4GiB of RAM.

      99 out of 100 times if you can recompile the source code, it should perform faster because of the extra registers.

  3. I wish there was a 64-bit Raspbian.

    Sometimes it’s not even about the speed.

    Some binaries don’t come in 32-bit anymore.

    My daughter’s first computer was a Pi 3. We even had the “real” Java edition of Minecraft on it for her. It was usable but kind of laggy. I upgraded her to an actual desktop computer in the hopes of getting better performance. I installed Raspbian on the desktop so she could keep the same interface. Her home directory is on the LAN so it was really no visible change to her.

    I didn’t even realize at the time that Raspbian was 32-bit, even on x86. Sure enough it seemed like right after that the next Minecraft update came out and they dropped 32-bit support. I’ve been meaning to replace Raspbian with some other distro ever since but she really liked her computer otherwise so I kind of dreaded changing it.

    Oh well. Now, thanks to her school exposing her to them she has a Chromebook. How do you teach a kid not to rely on the cloud with all their stuff when the schools are even participating in the marketing? At least it’s not Apple I guess. It even kind of runs desktop Linux programs… kind of.

      1. I’ve been using >4GB file systems (XFS and EXT2/3) on 32-bit Linux systems for 20 years. It’s important that software be built with _LARGEFILE64_SOURCE in order to use the proper extended data structures and APIs on glibc. That way seek and stat provide full 64-bit values. Nothing about 32-bit x86 leaves you stranded in 32-bit only arithmetic!

        For Raspbian you can boot a 64-bit kernel but continue with a 32-bit runtime. This allows you to access more RAM and swap space, and use the newer aarch64 instruction set or a mix of 32-bit and 64-bit applications if you choose to additionally install 64-bit runtime libraries.

  4. Just curious, which 64 bit Debian that was used in this article? Is it ‘trimmed down’ raspbian or modified Arm64 Debian? Cause as far as I know there is no official/preview build of Debian for Raspberry Pi 4.. CMIIW

      1. Yes, I tested it, but unfortunately it is only a player, and I use my raspberry not only as media center, but also for running a personal telegram bot and other things, and since LibreELEC doesn’t come with APT, or GCC, or other basic tools, I can’t install any of them without too much hacking.

  5. I would like to install aarch64 in my Pi 4.
    The link doesn’t mention which aarch64 version he used.
    Did he have to modify it to get it to work?
    I know he said he made some changes/optimizations for some of the tests, but are those necessary to get the Pi 4 to boot?

    1. Not sure how well it would work for graphical stuff, but I’ve got an embedded Pi3 on my desk that’s running OpenWRT aarch64, and Docker to give me basically any distro that has Docker images.

  6. Probably a huge bump in chess engine performance, remember 8×8 = 64. Not that more than a fraction of a percent of us play at a level where it would matter. (I am truly terrible at it, I lose almost every game against both human and computer opponents)

  7. What I found shocking was how much faster Wireguard was compared to OpenVPN regardless of 32 or 64bit. I’m certainly going to see if replacing my OpenVPN setup with Wireguard is something I want to do!

    1. If you don’t need layer 2 bridging (most people don’t), it’s probably the way to go.

      I have a rather niche corner case where L2 bridging + Windows client compatibility was needed. OpenVPN was one of the only solutions.

  8. Fascinating. Makes me wish I had been doing the benchmarks. I have a Pine64 I have yet to play with…

    I have to say the figures are as I suspected: for most things underwhelming. But for >32 bit math dramatic. Makes sense. Crypto and hashing functions are prime candidates for phenomenal speed improvements. Technically larger words sizes will improve their operation until you exceed their block size. In short something like using 4Kb keys will be improved by greater word sizes until you get a larger than 4Kb CPU. :-)

    The surprises for me were in his networking tests. I suspect most of the performance delta comes from the container/VM mechanism. As his memory graphs show there was only a slight increase in memory transfers so the dramatic gain wouldn’t appear to come from that. If the packets aren’t getting hashed with larger than 32bit hashes there simply isn’t much opportunity for improvement. Yet his graphs show a fantastic difference (one you wouldn’t see on the wire I might add). Must be something about the 64bit container software is better at getting the packets in and out.

    The second surprise was how much faster WireGuard is to OpenVPN. Going to have to check that out. I hadn’t heard of it.

    IPv6 addressing will certainly see a benefit from a 64bit CPU and even a 128bit CPU. Its so incredibly wasteful. Hashing of connection data if connection tracking was enabled would probably also benefit from 64bit if written to use it.

    I would say that if I bought a 64bit ARM I would certainly make run a 64bit OS. Otherwise there isn’t a whole lot of point to it.

    1. I just have to say this again… OpenVPN really surprised me. But in two directions: how much faster WireGuard was and how little improvement came from 64bit. But then I look back and WireGuard saw only a minor improvement too. Makes me think that neither of those packages are using 64bit optimized cryptographic functions. There should be some significant increases.

  9. The other thing worth thinking about when building a 64-bit RasPi4 setup is enabling hugetlbfs support (on the 4G model, especially if you want to DMA big chunks of data quickly from userland) it’s worth having.

    I built a local 64bit kernel from the RasPi kernel git source with hugepages enabled, then debootstrap’d myself an aarch64 environment and the result is a pretty sweet setup :-)

  10. You do know if you add “arm_64bit=1” in config.txt it loads the 64 bit Raspbian kernel. Have to do it after the first apt update apt full-upgrade so the kernel is on the pi…

  11. I read the article. The networking benchmarks not great; for example, turning off the NIC’s onboard features rather than leaving them on and asking how well the operating system takes advantage of its hardware.

    A major advantage of 64-bit Linux is avoiding Linux’s 32-bit ‘lowmem’ of kernel-datastructure RAM. On a 32-bit machine with memory large compared to the addressable space (eg, 4GB) the lowmem can be made fragment to unusableness. A benchmark would show this clearly.

    Another advantage of ARM’s 64-but architecture is the greater number of registers. People complaining about the increase in word-size would do well to remember that the increase in the number of registers lowers the overall use of RAM, so moving to 64-bit doesn’t necessarily imply a increase in RAM I/O transactions. There’s other microarchitectural wins from 64-bit, such as longer relative addressing. The RPi4 is a great platform to benchmark how these microarchitectural changes wash out in the real work, unfortunately doing so requires a 64-bit userspace as well as a 64-bit kernel and we’re not quite there yet.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.