The word supercomputer gets thrown around quite a bit. The original Cray-1, for example, operated at about 150 MIPS and had about eight megabytes of memory. A modern Intel i7 CPU can hit almost 250,000 MIPS and is unlikely to have less than eight gigabytes of memory, and probably has quite a bit more. Sure, MIPS isn’t a great performance number, but clearly, a top-end PC is way more powerful than the old Cray. The problem is, it’s never enough.
Today’s computers have to processes huge numbers of pixels, video data, audio data, neural networks, and long key encryption. Because of this, video cards have become what in the old days would have been called vector processors. That is, they are optimized to do operations on multiple data items in parallel. There are a few standards for using the video card processing for computation and today I’m going to show you how simple it is to use CUDA — the NVIDIA proprietary library for this task. You can also use OpenCL which works with many different kinds of hardware, but I’ll show you that it is a bit more verbose.
Continue reading “CUDA is Like Owning a Supercomputer”
We keep seeing more and more Tensor Flow neural network projects. We also keep seeing more and more things running in the browser. You don’t have to be Mr. Spock to see this one coming. TensorFire runs neural networks in the browser and claims that WebGL allows it to run as quickly as it would on the user’s desktop computer. The main page is a demo that stylizes images, but if you want more detail you’ll probably want to visit the project page, instead. You might also enjoy the video from one of the creators, [Kevin Kwok], below.
TensorFire has two parts: a low-level language for writing massively parallel WebGL shaders that operate on 4D tensors and a high-level library for importing models from Keras or TensorFlow. The authors claim it will work on any GPU and–in some cases–will be actually faster than running native TensorFlow.
Continue reading “Neural Nets in the Browser: Why Not?”
Often, CPUs that work together operate on SIMD (Single Instruction Multiple Data) or MISD (Multiple Instruction Single Data), part of Flynn’s taxonomy. For example, your video card probably has the ability to apply a single operation (an instruction) to lots of pixels simultaneously (multiple data). Researchers at the University of California–Davis recently constructed a single chip with 1,000 independently programmable processors onboard. The device is energy efficient and can compute up to 1.78 trillion instructions per second.
The KiloCore chip (not to be confused with the 2006 Rapport chip of the same name) has 621 million transistors and uses special techniques to be energy efficient, an important design feature when dealing with so many CPUs. Each processor operates at 1.78 GHz or less and can shut itself down when not needed. The team reports that even when computing 115 billion instructions per second, the device only consumes about 700 milliwatts.
Unlike some multicore designs that use a shared memory area to communicate between processors, the KiloCore allows processors to directly communicate. If you are just a diehard Arduino user, maybe you could scale up this design. Or, if you want to make use of the unused power in your video card under Linux, you can always try to bring KGPU up to date.
Horse racing has been around since the time of the ancient Greeks. Often called the sport of kings, it was an early platform for making friendly wagers. Over time, private bets among friends gave way to bookmaking, and the odds of winning skewed in favor of a new concept called the “house”.
During the late 1860s, an entrepreneur in Paris named Joseph Oller invented a new form of betting he called pari-mutuel. In this method, bettors wager among themselves instead of against the house. Bets are pooled together and the winnings divided among the bettors. Pari-mutuel betting creates more organic odds than ones given by a profit-driven bookmaker.
Oller’s method caught on quite well. It brought fairness and transparency to betting, which made it even more attractive. It takes a lot of quick calculations to show real-time bet totals and changing odds, and human adding machines presented a bottleneck. In the early 1900s, a man named George Julius would change pari-mutuel technology forever by making an automatic vote-counting machine in his garage.
Continue reading “Tote Boards: the Impressive Engineering of Horse Gambling”
The 1980s were a heyday for strange computer architectures; instead of the von Neumann architecture you’d find in one of today’s desktop computers or the Harvard architecture of a microcontroller, a lot of companies experimented with strange parallel designs. While not used much today, at the time these were some of the most powerful computers of their day and were used as the main research tools of the AI renaissance of the 1980s.
Over at the Norwegian University of Science and Technology a huge group of students (13 members!) designed a modern take on the massively parallel computer. It’s called 256 Shades of Gray, and it processes 320×240 pixel 8-bit grayscale graphics like no microcontroller could.
The idea for the project was to create an array-based parallel image processor with an architecture similar to the Goodyear MPP formerly used by NASA or the Connection Machine found in the control room of Jurassic Park. Unlike these earlier computers, the team implemented their array processor in an FPGA, giving rise to their Lena processor this processor is in turn controlled by a 32-bit AVR microcontroller with a custom-build VGA output.
The entire machine can process 10 frames per second of 320×240 resolution grayscale video. There’s a presentation video available (in Norwegian), but the highlight might be their demo of The Game of Life rendered in real-time on their computer. An awesome build, and a very cool experience for all the members of the class.
It’s the end of the semester for [Bruce Land]’s microcontroller design class at Cornell, and the projects coming off the workbench this semester look as awesome as any before. For their final project, [Alexander Wang] and [Bill Jo] designed an audio frequency spectrum analyzer using two microcontrollers in a parallel setup.
This spectrum analyzer takes an audio signal from an iPod, phone, or CD player through a 3.5 mm jack and displays the level for dozens of frequency bands much like an audio visualizer in iTunes or a nice car stereo display. To display these frequency bands, the spectrum analyzer first needs to perform a Fast Fourier Transform on the incoming audio signal. While FFT is extremely fast, the calculations are rather hardware intensive; calculating the frequencies and displaying them on a TV would be a bit much even for the ATMega1284 used in the project.
To graph the audio signal on their small display, [Alexander] and [Bill] broke the build up into two parts – one to do the math on the audio, and another to generate the NTSC video signal for the display.
As seen in the video after the break, the spectrum analyzer works wonderfully, and even though it only functions up to 4kHz, it’s more than enough to see what’s going on in most music.
Continue reading “Building a spectrum analyzer with parallel processing”