Neural networks have gone mainstream with a lot of heavy-duty — and heavy-weight — tools and libraries. What if you want to fit a network into a little computer? There’s tinn — the tiny neural network. If you can compile 200 lines of standard C code with a C or C++ compiler, you are in business. There are no dependencies on other code.
On the other hand, there’s not much documentation, either. However, between the header file and two examples, you should be able to figure it out. After all, it isn’t much code. The example in the repository directs you to download a handwriting number recognition dataset from the Internet. Once it trains that data, it shows you the expected output from the first item in the data set and then processes the first item and shows you the result.
AI is the new hotness! It’s 1965 or 1985 all over again! We’re in the AI Rennisance Mk. 2, and Google, in an attempt to showcase how AI can allow creators to be more… creative has released a synthesizer built around neural networks.
The NSynth Super is an experimental physical interface from Magenta, a research group within the Big G that explores how machine learning tools can create art and music in new ways. The NSynth Super does this by mashing together a Kaoss Pad, samples that sound like General MIDI patches, and a neural network.
Here’s how the NSynth works: The NSynth hardware accepts MIDI signals from a keyboard, DAW, or whatever. These MIDI commands are fed into an openFrameworks app that uses pre-compiled (with Machine Learning™!) samples from various instruments. This openFrameworks app combines and mixes these samples in relation to whatever the user inputs via the NSynth controller. If you’ve ever wanted to hear what the combination of a snare drum and a bassoon sounds like, this does it. Basically, you’re looking at a Kaoss pad controlling rompler that takes four samples and combines them, with the power of Neural Networks. The project comes with a set of pre-compiled and neural networked samples, but you can use this interface to mix your own samples, provided you have a beefy computer with an expensive GPU.
Not to undermine the work that went into this project, but thousands of synth heads will be disappointed by this project. The creation of new audio samples requires training with a GPU; the hardest and most computationally expensive part of neural networks is the training, not the performance. Without a nice graphics card, you’re limited to whatever samples Google has provided here.
Since this is Open Source, all the files are available, and it’s a project that uses a Raspberry Pi with a laser-cut enclosure, there is a huge demand for this machine learning Kaoss pad. The good news is that there’s a group buy on Hackaday.io, and there’s already a seller on Tindie should you want a bare PCB. You can, of course, roll your own, and the Digikey cart for all the SMD parts comes to about $40 USD. This doesn’t include the OLED ($2 from China), the Raspberry Pi, or the laser cut enclosure, but it’s a start. Of course, for those of you who haven’t passed the 0805 SMD solder test, it looks like a few people will be selling assembled versions (less Pi) for $50-$60.
Is it cool? Yes, but a basement-bound producer that wants to add this to a track will quickly learn that training machine learning algorithms cost far more than playing with machine algorithms. The hardware is neat, but brace yourself for disappointment. Just like AI suffered in the late 60s and the late 80s. We’re in the AI Renaissance Mk. 2, after all.
We’ve seen plenty of examples of neural networks listening to speech, reading characters, or identifying images. KickView had a different idea. They wanted to learn to recognize radio signals. Not just any radio signals, but Orthogonal Frequency Division Multiplexing (OFDM) waveforms.
OFDM is a modulation method used by WiFi, cable systems, and many other systems. In particular, they look at an 802.11g signal with a bandwidth of 20 MHz. The question is given a receiver for 802.11g, how can you reliably detect that an 802.11ac signal — up to 160 MHz — is using your channel? To demonstrate the technique they decided to detect 20 MHz signals using a 5 MHz bandwidth.
Increasingly these days drones are being used for urban surveillance, delivery, and examining architectural structures. To do this autonomously often involves using “map-localize-plan” techniques wherein first, the location is determined on a map using GPS, and then based on that, control commands are produced.
A neural network that does steering and collision prediction can compliment the map-localize-plan techniques. However, the neural network needs to be trained using video taken from actual flying drones. But generating that training video involves many hours of flying drones at street level putting vehicles and pedestrians at risk. To train their DroNet, Researchers from the University of Zurich and the Universidad Politecnica de Madrid have come up with safer sources for that video, video recorded from driving cars and bicycles.
For the drone steering predictions, they used over 70,000 images and corresponding steering angles from the publically available car driving data from Udacity’s Open Source Self-Driving project. For the collision predictions, they mounted a GoPro camera to the handlebars of a bicycle and drove around a city. Video recording began when the bicycle was distant from an object and stopped when very close to the object. In total, they collected 32,000 images.
To use the trained network, images from the drone’s forward-facing camera were fed into the network and the output was a steering angle and a probability of collision, which was turned into a velocity. The drone remained at a constant height above ground, though it did work well from 1.5 meters to 5 meters up. It successfully navigated road lanes and avoided moving pedestrians and bicycles. Intersections did confuse it though, likely due to the open spaces messing with the collision predictions. But we think that shouldn’t be a problem when paired with map-localize-plan techniques as a direction to move through the intersection would be chosen for it using the location on the map.
As you can see in the video below, it not only does a decent job of flying down lanes but it also flies well in a parking garage and a hallway, even though it wasn’t trained for either of these.
Humans are very good at watching others and imitating what they do. Show someone a video of flipping a switch to turn on a CNC machine and after a single viewing they’ll be able to do it themselves. But can a robot do the same?
Bear in mind that we want the demonstration video to be of a human arm and hand flipping the switch. When the robot does it, the camera that is its eye will be seeing its robot arm and gripper. So somehow it’ll have to know that its robot parts are equivalent to the human parts in the demonstration video. Oh, and the switch in the demonstration video may be a different model and make, and the CNC machine may be a different one, though we’ll at least put the robot within reach of its switch.
Researchers from Google Brain and the University of Southern California have done it. In their paper describing how, they talk about a few different experiments but we’ll focus on just one, getting a robot to imitate pouring a liquid from a container into a cup.
It’s 2018, and while true hoverboards still elude humanity, some future predictions have come true. It’s now possible to talk to computers, and most of the time they might even understand you. Speech recognition is usually achieved through the use of neural networks to process audio, in a way that some suggest mimics the operation of the human brain. However, as it turns out, they can be easily fooled.
The attack begins with an audio sample, generally of a simple spoken phrase, though music can also be used. The desired text that the computer should hear instead is then fed into an algorithm along with the audio sample. This function returns a low value when the output of the speech recognition system matches the desired attack phrase. The input audio file is gradually modified using the mathematics of gradient descent, creating a result that to a human sounds like one thing, and to a machine, something else entirely.
The audio files are available on the site for your own experimental purposes. In a noisy environment with poor audio coupling between speakers and a Google Pixel, results were poor – OK Google only heard the human phrase, not the encoded attack phrase. Given that the sound quality was poor, and the files were generated with a different speech model, this is not entirely surprising. We’d love to hear the results of your experiments in the comments.
It’s all a part of [Nicholas]’s PhD studies around the strengths and pitfalls of neural networks. It highlights the fact that neural networks don’t always work in the way we think they do. Google’s Inception is susceptible to similar attacks with images, as we’ve seen recently.
[Thanks to Wolfgang for the tip!]