Startup Claims It Can Boost CPU Performance By 2-100X

Although Moore’s Law has slowed at bit as chip makers reach the physical limits of transistor size, researchers are having to look to other things other than cramming more transistors on a chip to increase CPU performance. ARM is having a bit of a moment by improving the performance-per-watt of many computing platforms, but some other ideas need to come to the forefront to make any big pushes in this area. This startup called Flow Computing claims it can improve modern CPUs by a significant amount with a slight change to their standard architecture.

It hopes to make these improvements by adding a parallel processing unit, which they call the “back end” to a more-or-less standard CPU, the “front end”. These two computing units would be on the same chip, with a shared bus allowing them to communicate extremely quickly with the front end able to rapidly offload tasks to the back end that are more inclined for parallel processing. Since the front end maintains essentially the same components as a modern CPU, the startup hopes to maintain backwards compatibility with existing software while allowing developers to optimize for use of the new parallel computing unit when needed.

While we’ll take a step back and refrain from claiming this is the future of computing until we see some results and maybe a prototype or two, the idea does show some promise and is similar to some ARM computers which have multiple cores optimized for different tasks, or other computers which offload non-graphics tasks to a GPU which is more optimized for processing parallel tasks. Even the Raspberry Pi is starting to take advantage of external GPUs for tasks like these.

A System Board For The 8008

Intel processors, at least for PCs, are ubiquitous and have been for decades. Even beyond the chips specifically built by Intel, other companies have used their instruction set to build chips, including AMD and VIA, for nearly as long. They’re so common the shorthand “x86” is used for most of these processors, after Intel’s convention of naming their processors with an “-86” suffix since the 1970s. Not all of their processors share this convention, though, but you’ll have to go even further back in time to find one. [Mark] has brought one into the modern age and is showing off his system board for this 8008 processor.

The 8008 predates any x86 processor by about six years and was among the first mass-produced 8-bit processors even before the well-known 8080. The expansion from four bits to eight was massive for the time and allowed a much wider range of applications for embedded systems and early personal computers. [Mark] goes into some of the details for programming these antique processors before demonstrating his system board. It gets power from a USB-C connection and uses a set of regulators and level shifters to make sure the voltages all match. Support for all the functions the 8008 needs is courtesy of an STM32. That includes the system memory.

For those looking to develop something like this, [Mark] has also added his development tools to a separate GitHub page. Although it’s always a good idea for those interested in computer science to take a look at old processors like these, it’s not always the easiest path to get original hardware like this, which also carries the risk of letting smoke out of delicate components. A much easier route is to spin up an emulator like an 8086 IBM PC emulator on an ESP32. Want to see inside this old chip? Have a look.

Continue reading “A System Board For The 8008”

Now The V In RISC-V Stands For VRoom

Hundreds of variations of open-source CPUs written in an HDL seem to float around the internet these days (and that’s a great thing). Many are RISC-V, an open-source instruction set (ISA), and are small toy processors useful for learning and small tasks. However, if you’re [Paul Campbell], you go for a high-end super-scalar, out-of-order, speculative, 8 IPC monster of a RISC-V CPU known as VRoom!.

That might seem a bit like word soup to the uninitiated in the processor design world (which is admittedly relatively small) but what makes this different from VexRISC is the scale and complexity. Rather than executing one instruction at a time sequentially, it executes multiple instructions, completing them concurrently in whatever order it can handle. The VexRISC chip is a good 32-bit modular design that can run Linux. It pulls a solid 1.57 DMIPS/MHz with everything turned on. The VRoom already clocks in at mighty 6.5 DMIPS/MHz, with more performance gains. It peaks at 8 instructions every clock cycle with a dual register file and a clever committing system to keep up.

VRoom is written in System Verilog to leverage Verilator (a handy linting and simulation framework), and while there is some C that generates different files, we’d wager it is pretty run-of-the-mill compared to a TypeScript based project. VRoom currently boots Linux thanks to an AWS-FPGA instance (a Xilinx VU9P Ultrascale), though it has to be trimmed to fit. [Paul] has big plans working his way up to a server-class chip with lots of cores and a huge cache.

It’s all on GitHub under a GPLv3 license; go check it out! [Paul] also has a talk with lots of great details. If you’re interested in getting into RISC-V but a server-class isn’t your speed, we heard Espressif is starting to use RISC-V cores in their ever-popular ESP series.

Gaming In Different Languages

One of the perks of using older hardware is its comparative simplicity and extensive documentation. After years or decades of users programming on a platform, the amount of knowledge available for it can become extensive. This is certainly the case with the 6502 microprocessor, used in old Apple computers and some video game systems from the ’80s. The extensive amount of resources available make it a prime candidate in exploring various programming languages, and their advantages and disadvantage.

This project looks into those differences using a robot game, which has been programmed four different ways in three languages. [Joey] created the game in Python first and then began to port it to the 65C02, a CMOS variant of the 6502. The first iteration is its assembly language, and then a second iteration with optimized assembly code. From there, he ports it to C and then finally to Forth. Each version of the game is available to play in a browser using an emulator to run the 6502 hardware.

Since the games run in the browser, other tools are available to examine the way the game runs in each language. Registers can be viewed in real time, as well as the values stored in the memory. It’s an interesting look at an old piece of hardware and of its inner workings. For an even deeper dive into the 6502, it’s possible to build a working computer on breadboards using one.

Where The Work Is Really Done – Casual Profiling

Once a program has been debugged and works properly, it might be time to start optimizing it. A common way of doing this is a method called profiling – watching a program execute and counting the amount of computing time each step in the program takes. This is all well and good for most programs, but gets complicated when processes execute on more than one core. A profiler may count time spent waiting in a program for a process in another core to finish, giving meaningless results. To solve this problem, a method called casual profiling was developed.

In casual profiling, markers are placed in the code and the profiler can measure how fast the program gets to these markers. Since multiple cores are involved, and the profiler can’t speed up the rest of the program, it actually slows everything else down and measures the markers in order to simulate an increase in speed. [Daniel Morsig] took this idea and implemented it in Go, with an example used to demonstrate its effectiveness speeding up a single process by 95%, resulting in a 22% increase in the entire program. Using a regular profiler only counted a 3% increase, which was not as informative as the casual profiler’s 22% measurement.

We got this tip from [Greg Kennedy] who notes that he hasn’t seen much use of casual profiling outside of the academic world, but we agree that there is likely some usefulness to this method of keeping track of a multi-threaded program’s efficiency. If you know of any other ways of solving this problem, or have seen causal profiling in use in the wild, let us know in the comments below.

Header image: Alan Lorenzo [CC BY-SA 3.0].

Hacking The Pocket Operator

The number of easily usable and programmable microcontrollers is small, so when selecting one for a project there are only a handful of very popular, well documented chips that most of us reach for. The same can be said for most small companies selling electronics as well, so if you reach for a consumer device that is powered by a microcontroller it’s likely to have one of these few in it. As a result, a lot of these off-the-shelf devices are easy to hack, reprogram, or otherwise improve, such as the Robot Pocket Operator.

The Pocket Operator is a handheld, fully-featured synthesizer complete with internal speaker. It runs on a Cortex M3, a very popular ARM processor which has been widely used for many different applications, and features everything you would need for a synthesizer in one tiny package, including a built-in speaker. It also supports a robust 24-bit DAC/ADC and all the knobs and buttons you would need. And now, thanks to [Frank Buss] there is a detailed teardown on exactly how this device operates.

Some of the highlights from the teardown include detailed drawings of how the display operates, all of the commands for controlling the device, and even an interesting note about how the system clock operates even when the device has been powered off for a substantial amount of time. For a pocket synthesizer this has a lot to offer, even if you plan on using it as something else entirely thanks to the versatility of the Cortex M3.

Continue reading “Hacking The Pocket Operator”

An Arduino From The Distant Past

Arduinos are a handy tool to have around. They’re versatile, cheap, easy to program, and have a ton of software libraries to build on. They’ve only been around for about a decade and a half though, so if you were living in 1989 and wanted to program a microcontroller you’d probably be stuck with an 8-bit microprocessor with no built-in peripherals to help, reading from a physical book about registers and timing, and probably trying to get a broken ribbon cable to behave so it would actually power up. If you want a less frustrating alternate history to live in, though, check out the latest project from [Marek].

He discovered some 6502 chips (Polish language, Google Translate link) that a Chinese manufacturer was selling, but didn’t really trust that they were legitimate. On a lark he ordered some and upon testing them he found out that they were real 6502s. Building an 8-bit computer is something he’d like to do, but in the meantime he decided to do a project using one of these chips as a general-purpose microcontroller similar to a modern Arduino. The project has similar specs as an Arduino too, including 8kB of RAM memory, 8kB of I/O address space, and various EPROM capabilities. [Marek] went on to build a shield board for it as well, for easy access to some switches and LEDs. It’s a great build that anyone interested in microcontrollers should check out.

Keep in mind that an ATtiny45 has 8 bits like the 6502 but only costs around $1 USD, whereas a 6502 would have cost around $200 in today’s dollars. It’s really only in modern times that we can appreciate the 6502 as a cheap 8-bit microcontroller for that reason alone, but we can also appreciate how it ushered in a computer revolution since competing Intel and Motorola chips cost around six times more before it showed up. They became so popular in fact that people still regularly use them to build retrocomputers of all kinds.