This Raspberry Pi 2 with computer vision and two solenoid “fingers” was getting absurdly high scores on a mobile game as of late 2015, but only recently has [Kristian] finished fleshing the project out with detailed documentation.
Developed for a course in image analysis and computer vision, this project wasn’t really about cheating at a mobile game. It wasn’t even about a robotic interface to a smartphone screen; it was a platform for developing and demonstrating the image analysis theory he was learning, and the computer vision portion is no hack job. OpenCV was used as a foundation for accessing the camera, but none of the built-in filters are used. All of the image analysis is implemented from scratch.
The game is a simple. Humans and zombies move downward in two columns. Zombies (green) should get a screen tap but not humans. The Raspberry Pi camera takes pictures of the smartphone’s screen, to which a HSV filter is applied to filter out everything except green objects (zombies). That alone would be enough to get you some basic results, but not nearly good enough to be truly reliable and repeatable. Therefore, after picking out the green objects comes a whole chain of additional filtering. The details of that are covered on [Kristian]’s blog post, but the final report for the project (PDF) is where the real detail is.
If you’re interested mainly in seeing a machine pound out flawless victories, the video below shows everything running smoothly. The pounding sounds make it seem like the screen is taking a lot of abuse, but [Kristian] mentions that’s actually noise from the solenoids and not a product of them battling the touchscreen. This setup can be easily adapted to test out apps on different models of phones — something that has historically cost quite a bit of dough.
If you’re interested in the nitty-gritty details of the reasons and methods used for the computer vision portions, be sure to go through [Kristian]’s github repository where everything about the project lives (including the aforementioned final report.)
Continue reading “Abusing a Cellphone Screen with Solenoids Posts High Score”
Lightning photography is a fine art. It requires a lot of patience, and until recently required some fancy gear. [Saulius Lukse] has always been fascinated by lightning storms. When he was a kid he used to shoot lightning with his dad’s old Zenit camera — It was rather challenging. Now he’s figured out a way to do it using a GoPro.
He films at 1080@60, which we admit, isn’t the greatest resolution, but we’re sure the next GoPro will be filming 4K60 next. This means you can just set up your GoPro outside during the storm, and let it do it what it does best — film video. Normally, you’d then have to edit the footage and extract each lightning frame. That could be a lot of work.
[Saulius] wrote a Python script using OpenCV instead. Basically, the OpenCV script spots the lightning and saves motion data to a CSV file by detecting fast changes in the image.
The result? All the lightning frames plucked out from the footage — and it only took an i7 processor about 8 minutes to analyze 15 minutes of HD footage. Not bad.
Now if you feel like this is still cheating, you could build a fancy automatic trigger for your DSLR instead…
A Group of MIT, Microsoft, and Adobe researchers have managed to reproduce sound using video alone. The sounds we make bounce off every object in the room, causing microscopic vibrations. The Visual Microphone utilizes a high-speed video camera and some clever signal processing to extract an audio signal from these vibrations. Using video of everyday objects such as snack bags, plants, Styrofoam cups, and water, the team was able to reproduce tones, music and speech. Capturing audio from light isn’t exactly new. Laser microphones have been around for years. The difference here is the fact that the visual microphone is a completely passive device. No laser or special illumination is required.
The secret is in the signal processing, which the team explains in their SIGGRAPH paper (pdf link). They used a complex steerable pyramid along with wavelet filters to obtain local pixel motion values. These local values are averaged into a global motion value. From this global motion value the team is able to measure movement down to 1/1000 of a pixel. Plenty of resolution to decode audio data.
Most of the research is performed with high-speed video cameras, which are well outside the budget of the average hacker. Don’t despair though, the team did prove out that the same magic can be performed with consumer cameras, albeit with lower quality results. The team took advantage of the rolling shutter found in most of today’s CMOS imager based consumer cameras. Rolling shutter CMOS sensors capture images one row at a time. Each row can be processed in a similar fashion to the frames of the high-speed camera. There are some inter-frame gaps when the camera isn’t recording anything though. Even with the reduced resolution, it’s easy to pick out “Mary had a little lamb” in the video below.
We’re blown away by this research, and we’re sure certain organizations will be looking into it for their own use. Don’t pull out your tin foil hats yet though. Foil containers proved to be one of the best sound reflectors.
Continue reading “Focus Your Ears with The Visual Microphone”
As 3D printing continues to grow, people are developing more and more ways to get 3D models. From the hardware based scanners like the Microsoft Kinect to software based like 123D Catch there are a lot of ways to create a 3D model from a series of images. But what if you could make a 3D model out of a single image? Sound crazy? Maybe not. A team of researchers have created 3-Sweep, an interactive technique for turning objects in 2D images into 3D models that can be manipulated.
To be clear, the recognition of 3D components within a single image is a bit out of reach for computer algorithms alone. But by combining the cognitive abilities of a person with the computational accuracy of a computer they have been able to create a very simple tool for extracting 3D models. This is done by outlining the shape similar to how one might model in a CAD package — once the outline is complete, the algorithm takes over and creates a model.
The software was debuted at Siggraph Asia 2013 and has caused quite a stir on the internet. Watch the fascinating video that demonstrates the software process after the break!
Continue reading “3-Sweep: Turning 2D images into 3D models”
[Jeremy Blum], [Jason Wright], and [Sam Sinensky] combined forces for twenty-four hours to automate how the entertainment and lighting works at their hackerspace. They commandeered the whiteboard and used an already present webcam as part of their project. You can see the black tokens which can be moved around the blue tape outline to actuate the controls.
MATLAB is fed an image from the webcam which monitors the space. Frames are received once every second and parsed for changes in the tokens. There are small black squares which either skip to the next track of music or affect pause/play. Simply move them off of their designated spot and the image processing does the rest. This goes for the volume slider as well. We think the huge token for the lights is to ensure that the camera can sense a change in a darkened room.
If image processing isn’t your thing you can still control your audio entertainment with a frickin’ laser.
Continue reading “24-hour hackathon project adds object-based automation to hackerspace”
The 1980s were a heyday for strange computer architectures; instead of the von Neumann architecture you’d find in one of today’s desktop computers or the Harvard architecture of a microcontroller, a lot of companies experimented with strange parallel designs. While not used much today, at the time these were some of the most powerful computers of their day and were used as the main research tools of the AI renaissance of the 1980s.
Over at the Norwegian University of Science and Technology a huge group of students (13 members!) designed a modern take on the massively parallel computer. It’s called 256 Shades of Gray, and it processes 320×240 pixel 8-bit grayscale graphics like no microcontroller could.
The idea for the project was to create an array-based parallel image processor with an architecture similar to the Goodyear MPP formerly used by NASA or the Connection Machine found in the control room of Jurassic Park. Unlike these earlier computers, the team implemented their array processor in an FPGA, giving rise to their Lena processor this processor is in turn controlled by a 32-bit AVR microcontroller with a custom-build VGA output.
The entire machine can process 10 frames per second of 320×240 resolution grayscale video. There’s a presentation video available (in Norwegian), but the highlight might be their demo of The Game of Life rendered in real-time on their computer. An awesome build, and a very cool experience for all the members of the class.
The Shard is the tallest building in Western Europe, and has a great view of London. The condos in the building are very expensive, and a tourist ride to the top of the building costs £24.95.
Since the value of the view is so high, [Willem] wanted to quantify the quality of the view at any given time. His solution is the Shard Rain Cam. This device combines a Logitech webcam with a Raspberry Pi to capture a time-lapse set of images. These images are fed to a Python script using OpenCV which quantifies the cloudiness.
[Willem] also had to build a weatherproof enclosure with a transparent window for the camera and RPi. ‘Clingfilm’, which is British for saran wrap, and mineral oil is used to improve the waterproofing of an IP54 rated enclosure.
The resulting data is displayed on www.whatcaniseefromtheshard.com, which provides an indication of whether or not the view is worth £24.95. All of the Python code is available, and is a good starting point for learning about image processing with OpenCV.