Neural Nets in the Browser: Why Not?

We keep seeing more and more Tensor Flow neural network projects. We also keep seeing more and more things running in the browser. You don’t have to be Mr. Spock to see this one coming. TensorFire runs neural networks in the browser and claims that WebGL allows it to run as quickly as it would on the user’s desktop computer. The main page is a demo that stylizes images, but if you want more detail you’ll probably want to visit the project page, instead. You might also enjoy the video from one of the creators, [Kevin Kwok], below.

TensorFire has two parts: a low-level language for writing massively parallel WebGL shaders that operate on 4D tensors and a high-level library for importing models from Keras or TensorFlow. The authors claim it will work on any GPU and–in some cases–will be actually faster than running native TensorFlow.

Continue reading “Neural Nets in the Browser: Why Not?”

1000 CPUs on a Chip

Often, CPUs that work together operate on SIMD (Single Instruction Multiple Data) or MISD (Multiple Instruction Single Data), part of Flynn’s taxonomy. For example, your video card probably has the ability to apply a single operation (an instruction) to lots of pixels simultaneously (multiple data). Researchers at the University of California–Davis recently constructed a single chip with 1,000 independently programmable processors onboard. The device is energy efficient and can compute up to 1.78 trillion instructions per second.

The KiloCore chip (not to be confused with the 2006 Rapport chip of the same name) has 621 million transistors and uses special techniques to be energy efficient, an important design feature when dealing with so many CPUs. Each processor operates at 1.78 GHz or less and can shut itself down when not needed. The team reports that even when computing 115 billion instructions per second, the device only consumes about 700 milliwatts.

Unlike some multicore designs that use a shared memory area to communicate between processors, the KiloCore allows processors to directly communicate. If you are just a diehard Arduino user, maybe you could scale up this design. Or, if you want to make use of the unused power in your video card under Linux, you can always try to bring KGPU up to date.

Tote Boards: the Impressive Engineering of Horse Gambling

Horse racing has been around since the time of the ancient Greeks. Often called the sport of kings, it was an early platform for making friendly wagers. Over time, private bets among friends gave way to bookmaking, and the odds of winning skewed in favor of a new concept called the “house”.

During the late 1860s, an entrepreneur in Paris named Joseph Oller invented a new form of betting he called pari-mutuel. In this method, bettors wager among themselves instead of against the house. Bets are pooled together and the winnings divided among the bettors. Pari-mutuel betting creates more organic odds than ones given by a profit-driven bookmaker.

Oller’s method caught on quite well. It brought fairness and transparency to betting, which made it even more attractive. It takes a lot of quick calculations to show real-time bet totals and changing odds, and human adding machines presented a bottleneck. In the early 1900s, a man named George Julius would change pari-mutuel technology forever by making an automatic vote-counting machine in his garage.

Continue reading “Tote Boards: the Impressive Engineering of Horse Gambling”

Massively parallel CPU processes 256 shades of gray


The 1980s were a heyday for strange computer architectures; instead of the von Neumann architecture you’d find in one of today’s desktop computers or the Harvard architecture of a microcontroller, a lot of companies experimented with strange parallel designs. While not used much today, at the time these were some of the most powerful computers of their day and were used as the main research tools of the AI renaissance of the 1980s.

Over at the Norwegian University of Science and Technology a huge group of students (13 members!) designed a modern take on the massively parallel computer. It’s called 256 Shades of Gray, and it processes 320×240 pixel 8-bit grayscale graphics like no microcontroller could.

The idea for the project was to create an array-based parallel image processor with an architecture similar to the Goodyear MPP formerly used by NASA or the Connection Machine found in the control room of Jurassic Park. Unlike these earlier computers, the team implemented their array processor in an FPGA, giving rise to their Lena processor this processor is in turn controlled by a 32-bit AVR microcontroller with a custom-build VGA output.

The entire machine can process 10 frames per second of 320×240 resolution grayscale video. There’s a presentation video available (in Norwegian), but the highlight might be their demo of The Game of Life rendered in real-time on their computer. An awesome build, and a very cool experience for all the members of the class.

Building a spectrum analyzer with parallel processing


It’s the end of the semester for [Bruce Land]’s microcontroller design class at Cornell, and the projects coming off the workbench this semester look as awesome as any before. For their final project, [Alexander Wang] and [Bill Jo] designed an audio frequency spectrum analyzer using two microcontrollers in a parallel setup.

This spectrum analyzer takes an audio signal from an iPod, phone, or CD player through a 3.5 mm jack and displays the level for dozens of frequency bands much like an audio visualizer in iTunes or a nice car stereo display. To display these frequency bands, the spectrum analyzer first needs to perform a Fast Fourier Transform on the incoming audio signal. While FFT is extremely fast, the calculations are rather hardware intensive; calculating the frequencies and displaying them on a TV would be a bit much even for the ATMega1284 used in the project.

To graph the audio signal on their small display, [Alexander] and [Bill] broke the build up into two parts – one to do the math on the audio, and another to generate the NTSC video signal for the display.

As seen in the video after the break, the spectrum analyzer works wonderfully, and even though it only functions up to 4kHz, it’s more than enough to see what’s going on in most music.

Continue reading “Building a spectrum analyzer with parallel processing”