Ask Hackaday: Computing Square Roots on FPGA?

Hackaday reader [nats.fr] wrote in with some code from a project that resizes a video stream on the fly using an FPGA. Doing this right means undoing whatever gamma correction has been applied to the original stream, resizing, and then re-applying the gamma. Making life simpler, [nats.fr] settled on a gamma of two, which means taking a bunch of square roots, which isn’t fast on an FPGA.

[nats]’s algorithm is pretty neat: it uses a first-stage lookup to figure out in which broad range the value lies, and then one step of Hero’s algorithm to refine from there. (We think this is equivalent to saying he does a piecewise linear interpolation, but we’re not 100% sure.) Anyway, it works decently.

Of course, when you start looking into the abyss that is special function calculation, you risk falling in. Wikipedia lists more methods of calculating square roots than we have fingers. One of them, CORDIC, avoids even using multiplication by resorting to clever bitshifts and a lookup table. Our go-to in these type of situations, Chebyshev polynomial approximation, didn’t even make the cut. (Although we suspect it would be a contender in the `gamma=1.8` or `gamma=2.2` cases, especially if combined with range-reduction in a first stage like [nats.fr] does.)

So what’s the best/fastest approximation for `sqrt(x)` for 16-bit integers on an FPGA? [nats.fr] is using a Spartan 6, so you can use a multiplier, but division is probably best avoided. What about arbitrary, possibly fractional, roots?

A Spreadsheet For Guesswork

Ever wish you could guess more precisely? Or maybe just make your guesses look confusingly legitimate? Guesstimate could help.

It uses Monte Carlo simulations to add some legitimacy to the ranges given to it. For example, if you say the cost of lumber for your next project could be between 2 and 8 dollars a piece, you don’t typically mean that it’s equally likely to be any of those numbers. Most people mean that the boards are most likely to be around 3-5 dollars and everything lower or higher is less probable. Using different shaped distributions, Guesstimate can help include this discrepancy of thought into your pseudo-calculations.

It’s a neat bit of code with a nice interface. There is a commercial side to the project for those who want to collaborate openly or pay someone to host it privately. It has a few neat example models for those interested.

Does anyone use anything like this in their daily lives? Is there another similar project out there? This kind of thing is pretty cool!

Computer-Designed Portraits, Knit By Hand!

Artist [Petros Vrellis] has done something that we’ve never seen before: his piece “A New Way to Knit” lives up to its name. What he’s done is to take the traditional circular loom, some black thread, and toss some computing at it. And then he loops the string around and around and around.

The end result of following the computer’s instructions is a greyscale portrait. Where few black strings overlap, it’s light, and where more overlap, it’s darker. That’s the whole gimmick, but the effect is awesome. As you zoom in and out, it goes from a recognizable face to a tangle of wires and back. Check out his video embedded below.

Shor’s Algorithm In Five Atoms

If you want to factor a number, one way to do it is Shor’s algorithm. That’s a quantum algorithm and finds prime factors of integers. That’s interesting because prime factorization is a big deal of creating or breaking most modern encryption techniques.

Back in 2001, a group at IBM factored 15 (the smallest number that the algorithm can factor) using a 7 qubit system that uses nuclear magnetic resonance. Later, other groups duplicated the feat using photonic qubits. Typical implementations take 12 qubits. However, recent work at MIT and the University of Innsbruck can do the same trick with 5 atoms caught in an ion trap. The researchers believe their implementation will easily scale to larger numbers.

Each qubit is an atom and LASER pulses perform the logic operations. By removing an electron to make each atom positively charged, an electric field can exactly hold the positively charged ions in position only microns apart.

We’ve covered quantum computing before. We’ve even talked about the effect of practical quantum computing on encryption. You might also want to read more about the algorithm involved.

Photo credit: Jose-Luis Olivares/MIT

Forty-Year-Old Arcade Game Reveals Secrets of Robot Path Planning

What’s to be gained from reverse engineering a four-decade-old video game? As it turns out, quite a lot, and as you’ll learn from [Norbert]’s recent talk at the ViennaJS meetup, it’s not just about bringing a classic back to life.

The game in question is Kee Game’s Sprint 2, a monochrome 2D car race that allowed two players to compete head to head. The glorious Harvest Gold and Burnt Orange color scheme just screams 1970s, and it might be hard to see why this game was once a popular quarter-eater. But it was quite engaging for the day, and [Norbert] was interested in reverse engineering it. That he did, using JavaScript to build a faithful browser-based emulation of the game. And he took it further, creating a 3D first-person version of the game.

Anti-Cogging Algorithm Brings Out the Best in your Hobby Brushless Motors

Cheap, brushless motors may be the workhorses behind our RC planes and quadcopters these days, but we’ve never seen them  in any application that requires low-speed precision. Why? Sadly, cheap brushless motors simply aren’t mechanically well-constructed enough to offer precise position control because they exhibit cogging torque, an unexpected motor characteristic that causes slight variations in the output torque that depend rotor position. Undaunted, [Matthew Piccoli] and the folks at UPenn’s ModLab have developed two approaches to compensate and minimize torque-ripple, essentially giving a cheap BLDC Motor comparable performance to it’s pricier cousins. What’s more, they’ve proven their algorithm works in hardware by building a doodling direct-drive robotic arm from brushless motors that can trace trajectories.

Cogging torque is a function of position. [Matthew’s] algorithm works by measuring the applied voltage (or current) needed to servo the rotor to each measurable encoder position in a full revolution. Cogging torque is directional, so this “motor fingerprint” needs to be taken in both directions. With these measured voltages (or currents) logged for all measurable positions, compensating for the cogging torque is just a matter of subtracting off that measured value at any given position while driving the motor. [Matthew] has graciously taken the trouble of detailing the subtleties in his paper (PDF), where he’s actually developed an additional acceleration-based method.

Hobby BLDC motors abound these days, and you might even have a few spares tucked away on the shelf. This algorithm, when applied on the motor controller electronics, can give us the chance to revisit those projects that mandate precise motor control with high torque–something we could only dream about if we could afford a few Maxon motors. If you’re new to BLDC Motor Control theory, check out a few projects of the past to get yourself up-and-running.

Sort Faster with FPGAs

Sorting. It’s a classic problem that’s been studied for decades, and it’s a great first step towards “thinking algorithmically.” Over the years, a handful of sorting algorithms have emerged, each characterizable by it’s asymptotic order, a measure of how much longer an algorithm takes as the problem size gets bigger. While all sorting algorithms take longer to complete the more elements that must be sorted, some are slower than others.

For a sorter like bubble sort, the time grows quadradically longer for a linear increase in the number of inputs; it’s of order `O(N²)`.With a faster sorter like merge-sort, which is `O(N*log(N))`, the time required grows far less quickly as the problem size gets bigger. Since sorting is a bit old-hat among many folks here, and since `O(N*log(N))` seems to be the generally-accepted baseline for top speed with a single core, I thought I’d pop the question: can we go faster?

In short — yes, we can! In fact, I’ll claim that we can sort in linear time, i.e a running time of `O(N)`. There’s a catch, though: to achieve linear time, we’ll need to build some custom hardware to help us out. In this post, I’ll unfold the problem of sorting in parallel, and then I”ll take us through a linear-time solution that we can synthesize at home on an FPGA.

Need to cut to the chase? Check out the full solution implemented in SystemVerilog on GitHub. I’ve wrapped it inside an SPI communication layer so that we can play with it using an everyday microcontroller.

To understand how it works, join us as we embark on an adventure in designing algorithms for hardware. If you’re used to thinking of programming in a stepwise fashion for a CPU, it’s time to get out your thinking cap!