Sort Faster with FPGAs

Sorting. It’s a classic problem that’s been studied for decades, and it’s a great first step towards “thinking algorithmically.” Over the years, a handful of sorting algorithms have emerged, each characterizable by it’s asymptotic order, a measure of how much longer an algorithm takes as the problem size gets bigger. While all sorting algorithms take longer to complete the more elements that must be sorted, some are slower than others.

For a sorter like bubble sort, the time grows quadradically longer for a linear increase in the number of inputs; it’s of order O(N²).With a faster sorter like merge-sort, which is O(N*log(N)), the time required grows far less quickly as the problem size gets bigger. Since sorting is a bit old-hat among many folks here, and since O(N*log(N)) seems to be the generally-accepted baseline for top speed with a single core, I thought I’d pop the question: can we go faster?

In short — yes, we can! In fact, I’ll claim that we can sort in linear time, i.e a running time of O(N). There’s a catch, though: to achieve linear time, we’ll need to build some custom hardware to help us out. In this post, I’ll unfold the problem of sorting in parallel, and then I”ll take us through a linear-time solution that we can synthesize at home on an FPGA.

Need to cut to the chase? Check out the full solution implemented in SystemVerilog on GitHub. I’ve wrapped it inside an SPI communication layer so that we can play with it using an everyday microcontroller.

To understand how it works, join us as we embark on an adventure in designing algorithms for hardware. If you’re used to thinking of programming in a stepwise fashion for a CPU, it’s time to get out your thinking cap!

Continue reading “Sort Faster with FPGAs”

Shmoocon 2016: GPUs and FPGAs to Better Detect Malware

One of the big problems in detecting malware is that there are so many different forms of the same malicious code. This problem of polymorphism is what led Rick Wesson to develop icewater, a clustering technique that identifies malware.

Presented at Shmoocon 2016, the icewater project is a new way to process and filter the vast number of samples one finds on the Internet. Processing 300,000 new samples a day to determine if they have polymorphic malware in them is a daunting task. The approach used here is to create a fingerprint from each binary sample by using a space-filling curve. Polymorphism will change a lot of the bits in each sample, but as with human fingerprints, patterns are still present in this binary fingerprints that indicate the sample is a variation on a previously known object.
Continue reading “Shmoocon 2016: GPUs and FPGAs to Better Detect Malware”

Zedboard Multiport Ethernet

The Zedboard uses Xilinx’s Zynq, which is a combination ARM CPU and FPGA. [Jeff Johnson] recently posted an excellent two-part tutorial covering using a Zedboard with multiple Ethernet ports. The lwIP (light-weight Internet Protocol) stack takes care of the software end.

Vivado is Xilinx’s software for configuring the Zynq (among other chips), and the tutorial shows you how to use it. The Ethernet PHY is an FPGA Mezzanine Card (FMC) with four ports that is commercially available. The project uses VHDL, but there is no VHDL coding involved, just the use of canned components.

The real issue when using an FPGA and a CPU is the interface between the processor and the FPGA circuitry. In this case, the ARM standard AXI bus does this task, and the Ethernet component properly interfaces to that bus. The IP application in the second part of the post is an echo server.

We’ve seen the Zynq used in flying machines and also in a music synthesizer. Although this project doesn’t use any Verilog or VHDL that you create, it is still a great example of configuring using Vivado and using common components in a design.

BabyBaby: A 1948 Computer on an FPGA

The Manchester Baby seems simple today. A 32-bit machine with 32 words of storage. It wasn’t meant to be a computer, though, but a test bed for the new Williams tube storage device. However, in 1948, it executed stored programs at about 1,100 instructions per second. The success of the machine led to a series of computers at Manchester University and finally to the first commercially available computer, the Ferranti Mark I.

[Dave] is lucky enough to volunteer to demonstrate the Baby replica at Machester’s Museum of Science Industry. He wanted his own Baby, so he used a Xilinx FPGA board to build a replica Baby named BabyBaby. Although it runs at the same speed as the original, it is–mercifully–much smaller than the real machine.

Continue reading “BabyBaby: A 1948 Computer on an FPGA”

FPGA to Ethernet Direct

When [iliasam] needed an Ethernet connection, he decided to see how much of the network interface he could put in the FPGA logic. Turns out that for 10 Base-T, he managed to get quite a bit inside the FPGA. His original post is in Russian, but automatic translation makes a passable attempt at converting to English.

This is a classic trade off all FPGA designers face: how much external logic do you use for a particular design. For example, do you add memory to the PCB, or use FPGA resources as memory? Each has its advantages and disadvantages (that’s why it is a trade off). However, if you are trying to keep things cheap, slashing external circuitry is often the way to go.

Continue reading “FPGA to Ethernet Direct”

32C3: A Free and Open Source Verilog-to-Bitstream Flow for iCE40 FPGAs

[Clifford] presented a fully open-source toolchain for programming FPGAs. If you don’t think that this is an impressive piece of work, you don’t really understand FPGAs.

The toolchain, or “flow” as the FPGA kids like to call it, consists of three parts: Project IceStorm, a low-level tool that can build the bitstreams that flip individual bits inside the FPGA, Arachne-pnr, a place-and-route tool that turns a symbolic netlist into the physical stuff that IceStorm needs, and Yosys which synthesizes Verilog code into the netlists needed by Arachne. [Clifford] developed both IceStorm and Yosys, so he knows what he’s talking about.

What’s most impressive is that FPGAs aren’t the only target for this flow. Because it’s all open source and modifiable, it has also been used for designing custom ASICs, good to know when you’re in need of your own custom silicon. [Clifford]’s main focus in Yosys is on formal verification — making sure that the FPGA will behave as intended in the Verilog code. A fully open-source toolchain makes working on this task possible.

If you’ve been following along with [Al Williams]’s FPGA posts, either this introduction or his more recent intermediate series that are also based on the relatively cheap Lattice iCEStick development kit, this video is a must-watch. It’s a fantastic introduction to the cutting-edge in free FPGA tools.

PSoC VGA on a $10 Development Board

We’ve always found the Cypress PSoC an interesting beast. It’s a CPU with functional blocks that you can configure to build various I/O devices, including incorporating FPGA logic using Verilog. [MiguelVP] has an excellent multi-part project that produces VGA output from a PSoC. So far it just generates a fixed pattern, but a frame buffer is in the works, and there is plenty of detail about how to configure the PSoC for the task.

Although the PSoC has some analog capability, [MiguelVP] uses a cheap R2R DAC and VGA connector to interface to the VGA monitor. You can get the same PSoC board the project uses for about $10. The software, unfortunately, is Windows-only, so be prepared to fire up a virtual machine if you run Linux or Mac. Our own [Bil Herd] did a video introduction to PSoC that you can watch after the break.

Continue reading “PSoC VGA on a $10 Development Board”