50-Year-Old Program Gets Speed Boost

At first glance, getting a computer program to run faster than the first electronic computers might seem trivial. After all, most of us carry enormously powerful processors in our pockets every day as if that’s normal. But [Mark] isn’t trying to beat computers like the ENIAC with a mobile ARM processor or other modern device. He’s now programming with the successor to the original Intel integrated circuit processor, the 4040, but beating the ENIAC is still little more complicated than you might think with a processor from 1974.

For this project, the goal was to best the 70-hour time set by ENIAC for computing the first 2035 digits of pi. There are a number of algorithms for performing this calculation, but using a 4-bit processor and an extremely limited memory of only 1280 bytes makes a number of these methods impossible, especially with the self-imposed time limit. The limited instruction set is a potential bottleneck as well with these early processors. [Mark] decided to use [Fabrice Bellard]’s algorithm given these limitations. He goes into great detail about the mathematics behind this method before coding it in JavaScript. Generating assembly language from a working JavaScript was found to be fairly straightforward.

[Mark] is also doing a lot of work on the 4040 to get this program running as well, including upgrades to the 40xx tool stack, the compiler and linker, and an emulator he’s using to test his program before sending it to physical hardware. The project is remarkably well-documented, including all of the optimizations needed to get these antique processors running fast enough to beat the ENIAC. We won’t spoil the results for you, but as a hint to how it worked out, he started this project using the 4040 since his original attempt using a 4004 wasn’t quite fast enough.

19 thoughts on “50-Year-Old Program Gets Speed Boost

  1. Cheating! These spigot algorithms weren’t discovered back then. The only Spigot I remember was for square root and even that needed to start at the base set.

    It would have been far more impressive if he had improved speed “within” the constraints of the original time. Throwing advanced math at an age old problem is akin to running it on faster modern hardware.

    It’s still very admirable but it is far less “ground breaking”.

    1. If you cannot use modern hardware(including for analysis, emulation, building and debugging), modern algorithms, modern compilers or modern optimizations, then all you can do is doing a few hand optimizations on the original source code or machine code. That would be too restrictive in my opinion. The entire point is to revisit an old problem with new ideas and see if we could have done it better given today’s knowledge on the same hardware.

    2. Why not go the whole way and stay within the constraints of what the original coders thought was good and what they wrote into their program? Probably because you’d just arrive at the same result! If you incorporate nothing new, nothing new happens.

      The cool thing about putting new knowledge into old hardware is that it’ll do things that were never originally thought possible.

  2. “It should be pure computation, with no pre-computed data tables in RAM, not even a simple multiplication table.”
    What is meant by pre-computed? Put in ROM and then transferred to ROM at boot (in C that would be the CRT init)? Or would computing it during boot and then storing it also not be allowed? The latter would be an unfair restriction in my opinion. He allows caching of results, so why not speed up multiplication?

    1. Just an typo. RAM instead of ROM. Fixed. Computing and storing multiplication table in RAM is OK (and I was doing that), but computing multiplication table via PC and storing it in RAM – don’t.

    1. That’s there to correct RPN calculations. Reversing a reverse, is like three rights to make a left. By reversing the resistor it’s done in one step. Great speed up techie approach. I do it all the time when needed.

    1. Implement a gate level replicas in a cpld which would be able to run at several 10s mhz

      And fit more than one 4004, with 8bit support and added floating point and Multiplication and division, and native memory operations

      How fast can a cold or fpga calculate pi?

      Them shits fast enough to calculate hashes for bitcoin and monero, and ravencoin

      1. For a task like this, an FPGA or CPLD doesn’t need to emulate the entire instruction set – just the instructions your algorithm needs.
        You might even create replacement instructions, to increase efficiency.

    2. I have read somewhere that Intel was very disappointed in the clock speed and 740 kHz is about the fastest safe speed from all chips from the factory. MAYBE a selected 4004 or 4040 could go faster, but then there is the rest of the chipset (4001,4002,4003) to consider, which is very timing sensitive given the nature of the design. For the above hardware where these are simulated, that part of the overclocking question won’t matter.

      Also, the 4004 and 4040 get pretty darned toasty in operation. Maybe a water cooled system?

      1. > Also, the 4004 and 4040 get pretty darned toasty in operation

        4040 has been run for 70 hours several times and was not that host. For safety I used couple very small radiators (for DIP-8 ICs).

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.