Hummingbirds, 3D Printing, and Deep Learning

Setting camera traps in your garden to see what local wildlife is around is quite popular. But [Chris Lam] has just one subject in mind: the hummingbird. He devised a custom setup to capture the footage he wanted using some neat tech.

To attract the hummingbirds, [Chris] used an off-the-shelf feeder — no need to re-invent the wheel there. To obtain the closeup footage required, a 4K action cam was used. This was attached to the feeder with a 3D-printed mount that [Chris] designed.

When it came to detecting the presence of a hummingbird in the video, there were various approaches that could have been considered. On the hardware side, PIR and ultrasonic distance sensors are popular for projects of this kind, but [Chris] wanted a pure software solution. The commonly used motion detection libraries for this type of project might have fallen over here, since the whole feeder was swinging in the air on a string, so [Chris] opted for machine learning.

A RESNET architecture was used to run a classification on each frame, to determine if the image contained a hummingbird or not. The initial attempt was not greatly successful, but after cropping the image to a smaller area around the feeder, classification accuracy greatly increased. After a bit of FFmpeg magic, the selected snippets were concatenated to make one video containing all the interesting parts; you can see the result in the clip after the break.

It seems that machine learning and wildlife cams are a match made in heaven. We’ve already written about a proof-of-concept project which identifies different animals in the footage when motion is detected.

Continue reading “Hummingbirds, 3D Printing, and Deep Learning”

Solar Pi Cluster Scours Internet for Nudes

There seems to be a universal truth on the Internet: if you open up a service to the world, eventually somebody will come in and try to mess it up. If you have a comment section, trolls will come in and fill it with pedantic complaints (so we’ve heard anyway, naturally we have no experience with such matters). If you have a service where people can upload files, then it’s a guarantee that something unsavory is eventually going to take up residence on your server.

Unfortunately, that’s exactly what [Christian Haschek] found while developing his open source image hosting platform, PictShare. He was alerted to some unsavory pictures on PictShare, and after he dealt with them he realized these could be the proverbial tip of the iceberg. But there were far too many pictures on the system to check manually. He decided to build a system that could search for NSFW images using a trained neural network.

The nude-sniffing cluster is made up of a trio of Raspberry Pi computers, each with its own Movidius neural compute stick to perform the heavy lifting. [Christian] explains how he installed the compute stick SDK and Yahoo’s open source learning module for identifying questionable images, the aptly named open_nsfw. The system can be scaled up by adding more Pis to the system, and since it’s all ARM processors and compute sticks, it’s energy efficient enough the whole system can run off a 10 watt solar panel.

After opening up the system with a public web interface where users can scan their own images, he offered his system’s services to a large image hosting provider to see what it would find. Shockingly, the system was able to find over 3,000 images that contained suspected child pornography. The appropriate authorities were notified, and [Christian] encourages anyone else looking to search their servers for this kind of content to drop him a line. Truly hacking for good.

This isn’t the first time we’ve seen Intel’s Movidius compute stick in the wild., and of course we’ve seen our fair share of Raspberry Pi clusters. From 750 node monsters down to builds which are far more show than go.

Disney’s New Robot Limbs Trained Using Neural Networks

Disney is working on modular, intelligent robot limbs that snap into place with magnets. The intelligence comes from a reasonable sized neural network that also incorporates some modularity. The robot is their Snapbot whose base unit can fit up to eight of limbs, and so far they’ve trained with up to three together.

The modularity further extends to a choice of three types of limb. One with roll and pitch, another with yaw and pitch, and a third with roll, yaw, and pitch. Interestingly, of the three types, the yaw-pitch one seems most effective.

Learning environment for Disney's modular robot legsIn this age of massive, deep neural networks requiring GPUs or even online services for training in a reasonable amount of time, it’s refreshing to see that this one’s only two layers deep and can be trained in three hours on a single-core, 3.4 GHz Intel i7 processor. Three hours may still seem long, but remember, this isn’t a simulation in a silicon virtual world. This is real-life where the servo motors have to actually move. Of course, they didn’t want to sit around and reset it after each attempt to move across the table so they built in an automatic mechanism to pull the robot back to the starting position before trying to cross the table again. To further speed training, they found that once they’d trained for one limb, they could then copy the last of the network’s layers to get a head starting on the training for two limbs.

Why do training? Afterall, we’ve seen pretty awesome multi-limbed robots working with manual coding, an example being this hexapod tank based on one from the movie Ghost in the Shell. They did that too and then compared the results of the manual approach with those of the trained one and the trained one moved further in the same amount of time. At a minimum, we can learn a trick or two from this modular crawler.

Check out their article for the details and watch it in action in its learning environment below.

Continue reading “Disney’s New Robot Limbs Trained Using Neural Networks”

A Cartoon-ifying Camera for Instant Absurdism

We take photographs as a way to freeze moments in time and to capture the details that get blurred by our unreliable memories. There is little room for interpretation, and this is kind of the whole point.

[Dan Macnish]’s latest project, Draw This, turns reality into absurdity. It’s a Raspberry Pi-based instant camera that trades whatever passed in front of the lens for a cartoon version of same. Draw This uses neural networks to ID the objects in the frame, and then draws upon thousands of images from Google’s Quick, Draw! dataset to provide a loose interpretation via thermal printer. Seems to us like the perfect camera to take to DEFCON (or any other part of Las Vegas).

If you have a Raspi3, a v2 camera, and a thermal printer, you can make your own crowd-sourced, cartoonified memories using the code in [Dan]’s repo. Still into recording reality? You can use Pi cameras to see in the dark or even explore a body of water.

Facebook’s Universal Music Translator

Star Trek has its universal language translator and now researchers from Facebook Artificial Intelligence Research (FAIR) has developed a universal music translator. Much of it is based on Google’s WaveNet, a version of which was also used in the recently announced Google Duplex AI.

Universal music translator architectureThe inspiration for it came from the human ability to hear music played by any instrument and to then be able to whistle or hum it, thereby translating it from one instrument to another. This is something computers have had trouble doing well, until now. The researchers fed their translator a string quartet playing Haydn and had it translate the music to a chorus and orchestra singing and playing in the style of Bach. They’ve even fed it someone whistling the theme from Indiana Jones and had it translate the tune to a symphony in the style of Mozart.

Shown here is the architecture of their network. Note that all the different music is fed into the same encoder network but each instrument which that music can be translated into has its own decoder network. It was implemented in PyTorch and trained using eight Tesla V100 GPUs over a total of six days. Efforts were made during training to ensure that the encoder extracted high-level semantic features from the music fed into it rather than just memorizing the music. More details can be found in their paper.

So if you want to hear how an electric guitar played in the style of Metallica might have been translated to the piano by Beethoven then listen to the samples in the video below.

Continue reading “Facebook’s Universal Music Translator”

360 Live VR Teleportation Uses Drones, Neural Networks, and Perseverance

This past semester I added research to my already full schedule of math and engineering classes, as any masochistic student eagerly would. Packed schedule aside, how do you pass up the chance to work on implementing 360° virtual teleportation to anywhere in the world, in real-time. Yes, it is indeed the same concept as the cult worshipped Star Trek transporter, minus the ability to physically be at the location. Perhaps we can add a, “beam me up, Scotty” command when shutting down.

The research lab I was working with is the Laboratory for Immersive CommunicatiON (LION). It’s funded by NSF, Microsoft, and Adobe and has been on the pursuit of VR teleportation for some time now.  There’s a lot of cool technologies at work here, like drones which are used as location collection devices. A network of drones will survey landscape anywhere in the world and build the collection assets needed for recreating it in VR. Okay, so a swarm of drones might seem a little intimidating at first, but when has emerging technology not?

Continue reading “360 Live VR Teleportation Uses Drones, Neural Networks, and Perseverance”

Modern Wizard Summons Familiar Spirit

In European medieval folklore, a practitioner of magic may call for assistance from a familiar spirit who takes an animal form disguise. [Alex Glow] is our modern-day Merlin who invoked the magical incantations of 3D printing, Arduino, and Raspberry Pi to summon her familiar Archimedes: The AI Robot Owl.

The key attraction in this build is Google’s AIY Vision kit. Specifically the vision processing unit that tremendously accelerates image classification tasks running on an attached Raspberry Pi Zero W. It no longer consumes several seconds to analyze each image, classification can now run several times per second, all performed locally. No connection to Google cloud required. (See our earlier coverage for more technical details.) The default demo application of a Google AIY Vision kit is a “joy detector” that looks for faces and attempts to determine if a face is happy or sad. We’ve previously seen this functionality mounted on a robot dog.

[Alex] aimed to go beyond the default app (and default box) to create Archimedes, who was to reward happy people with a sticker. As a moving robotic owl, Archimedes had far more crowd appeal than the vision kit’s default cardboard box. All the kit components have been integrated into Archimedes’ head. One eye is the expected Pi camera, the other eye is actually the kit’s piezo buzzer. The vision kit’s LED-illuminated button now tops the dapper owl’s hat.

Archimedes was created to join in Google’s promotion efforts. Their presence at this Maker Faire consisted of two tents: one introductory “Learn to Solder” tent where people can create a blinky LED badge, and the other tent is focused on their line of AIY kits like this vision kit. Filled with demos of what the kits can do aside from really cool robot owls.

Hopefully these promotional efforts helped many AIY kits find new homes in the hands of creative makers. It’s pretty exciting that such a powerful and inexpensive neural net processor is now widely available, and we look forward to many more AI-powered hacks to come.

Continue reading “Modern Wizard Summons Familiar Spirit”