As the cost of high-resolution images sensors gets lower, and the availability of small and cheap single board computers skyrockets, we are starting to see more astrophotography projects than ever before. When you can put a $5 Raspberry Pi Zero and a decent webcam outside in a box to take autonomous pictures of the sky all night, why not give it a shot? But in doing so, many hackers are recognizing a fact well-known to traditional telescope jockeys: seeing a few stars is easy, seeing a lot of stars is another story entirely.
The problem is that stars are fairly dim; a problem compounded by the light pollution you get unless you’re out in a rural area. You can’t just brighten up the images either, as that only increases the noise in the image. A programmer always in search of a challenge, [Benedikt Bitterli] decided to take a shot at using software to improve astrophotography images. He documented the entire process, failures and all, on his blog for anyone else who might be curious about what it really takes to create the incredible images of the night sky we see in textbooks.
In principle it’s simple: just take a lot of pictures of the sky, stack them on top of each other, and identify which points of light are stars and which ones are noise artifacts. But of course the execution is considerably more difficult. For one thing, unless the camera was on a mount that was automatically tracking the sky, the stars will have slightly moved in each image. To help with this process, [Benedikt] used a navigational trick that humanity has relied on for millennia: mapping constellations. By comparing groupings of stars in each image, his software is able to accurately overlay each image.
But that’s only one part of the equation. In his post, [Benedikt] goes over the incredible amount of math that goes into identifying individual stars in the sea of noise you get when a digital image sensor looks into the black. You certainly don’t need to understand all the math to appreciate the final results, but it’s a fascinating read for those with an interest in computer vision concepts.
People with dementia have trouble with some of the things we take for granted, including dressing themselves. It can be a remarkably difficult task involving skills like balance, pattern recognition inside of other patterns, ordering, gross motor skill, and dexterity to name a few. Just because something is common, doesn’t mean it is easy. The good folks at NYU Rory Meyers College of Nursing, Arizona State University, and MGH Institute of Health Professions talked with a caregiver focus group to find a way for patients to regain their privacy and replace frustration with independence.
Although this is in the context of medical assistance, this represents one of the ways we can offload cognition or judgment to computers. The system works by detecting movement when someone approaches the dresser with five drawers. Vocal directions and green lights on the top drawer light up when it is time to open the drawer and don the clothing inside. Once the system detects the article is being worn appropriately, the next drawer’s light comes one. A camera seeks a matrix code on each piece of clothing, and if it times out, a caregiver is notified. There is no need for an internet connection, nor should one be given.
Currently, the system has a good track record with identifying the clothing, but it is not proficient at detecting when it is worn correctly, which could lead to frustrating false alarms. Matrix codes seemed like a logical choice since they could adhere to any article of clothing and get washed repeatedly but there has to be a more reliable way. Perhaps IR reflective threads could be sewn into clothing with varying stitch lengths, so the inside and outside patterns are inverted to detect when clothing is inside-out. Perhaps a combination of IR reflective and absorbing material could make large codes without being visible to the human eye. How would you make a machine-washable, machine-readable visual code?
There are quick hacks, there are weekend projects and then there are years long journeys towards completion. [Boris Vitazek]’s grafofon falls into the latter category. His creation can best be described as electromechanical sequencer synthesizer with a multiplayer mode.
The storage medium and interface for this sequencer is a thirteen-meter loop of paper that is mounted like a conveyor belt. Music is composed by drawing on the paper or placing objects on it. This is usually done by the audience and the fact that the marker isn’t erased make the result collaborative and incremental.
These ‘scores’ are read by a camera and interpreted by software.This is a very vague description of this device, for a reason: the build went on over six years and both hard- and software went through several revisions in that time. It started as a trigger for MIDI notes and evolved from there.
In his write up [Boris] explains the technical aspects of each iteration. He also tells the stories of the people he met while working on the grafofon and how they influenced the build. If this look into the art world reminds you of your local hackerspace, it is because these worlds aren’t that far apart.
Hallucination is the erroneous perception of something that’s actually absent – or in other words: A possible interpretation of training data. Researchers from the MIT and the UMBC have developed and trained a generative-machine learning model that learns to generate tiny videos at random. The hallucination-like, 64×64 pixels small clips are somewhat plausible, but also a bit spooky.
The machine-learning model behind these artificial clips is capable of learning from unlabeled “in-the-wild” training videos and relies mostly on the temporal coherence of subsequent frames as well as the presence of a static background. It learns to disentangle foreground objects from the background and extracts the overall dynamics from the scenes. The trained model can then be used to generate new clips at random (as shown above), or from a static input image (as shown in pairs below).
Currently, the team limits the clips to a resolution of 64×64 pixels and 32 frames in duration in order to decrease the amount of required training data, which is still at 7 TB. Despite obvious deficiencies in terms of photorealism, the little clips have been judged “more realistic” than real clips by about 20 percent of the participants in a psychophysical study the team conducted. The code for the project (Torch7/LuaJIT) can already be found on GitHub, together with a pre-trained model. The project will also be shown in December at the 2016 NIPS conference.
Developed for a course in image analysis and computer vision, this project wasn’t really about cheating at a mobile game. It wasn’t even about a robotic interface to a smartphone screen; it was a platform for developing and demonstrating the image analysis theory he was learning, and the computer vision portion is no hack job. OpenCV was used as a foundation for accessing the camera, but none of the built-in filters are used. All of the image analysis is implemented from scratch.
The game is a simple. Humans and zombies move downward in two columns. Zombies (green) should get a screen tap but not humans. The Raspberry Pi camera takes pictures of the smartphone’s screen, to which a HSV filter is applied to filter out everything except green objects (zombies). That alone would be enough to get you some basic results, but not nearly good enough to be truly reliable and repeatable. Therefore, after picking out the green objects comes a whole chain of additional filtering. The details of that are covered on [Kristian]’s blog post, but the final report for the project (PDF) is where the real detail is.
If you’re interested mainly in seeing a machine pound out flawless victories, the video below shows everything running smoothly. The pounding sounds make it seem like the screen is taking a lot of abuse, but [Kristian] mentions that’s actually noise from the solenoids and not a product of them battling the touchscreen. This setup can be easily adapted to test out apps on different models of phones — something that has historically cost quite a bit of dough.
If you’re interested in the nitty-gritty details of the reasons and methods used for the computer vision portions, be sure to go through [Kristian]’s github repository where everything about the project lives (including the aforementioned final report.)
Lightning photography is a fine art. It requires a lot of patience, and until recently required some fancy gear. [Saulius Lukse] has always been fascinated by lightning storms. When he was a kid he used to shoot lightning with his dad’s old Zenit camera — It was rather challenging. Now he’s figured out a way to do it using a GoPro.
He films at 1080@60, which we admit, isn’t the greatest resolution, but we’re sure the next GoPro will be filming 4K60 next. This means you can just set up your GoPro outside during the storm, and let it do it what it does best — film video. Normally, you’d then have to edit the footage and extract each lightning frame. That could be a lot of work.
[Saulius] wrote a Python script using OpenCV instead. Basically, the OpenCV script spots the lightning and saves motion data to a CSV file by detecting fast changes in the image.
The result? All the lightning frames plucked out from the footage — and it only took an i7 processor about 8 minutes to analyze 15 minutes of HD footage. Not bad.
A Group of MIT, Microsoft, and Adobe researchers have managed to reproduce sound using video alone. The sounds we make bounce off every object in the room, causing microscopic vibrations. The Visual Microphoneutilizes a high-speed video camera and some clever signal processing to extract an audio signal from these vibrations. Using video of everyday objects such as snack bags, plants, Styrofoam cups, and water, the team was able to reproduce tones, music and speech. Capturing audio from light isn’t exactly new. Laser microphones have been around for years. The difference here is the fact that the visual microphone is a completely passive device. No laser or special illumination is required.
The secret is in the signal processing, which the team explains in their SIGGRAPH paper (pdf link). They used a complex steerable pyramid along with wavelet filters to obtain local pixel motion values. These local values are averaged into a global motion value. From this global motion value the team is able to measure movement down to 1/1000 of a pixel. Plenty of resolution to decode audio data.
Most of the research is performed with high-speed video cameras, which are well outside the budget of the average hacker. Don’t despair though, the team did prove out that the same magic can be performed with consumer cameras, albeit with lower quality results. The team took advantage of the rolling shutter found in most of today’s CMOS imager based consumer cameras. Rolling shutter CMOS sensors capture images one row at a time. Each row can be processed in a similar fashion to the frames of the high-speed camera. There are some inter-frame gaps when the camera isn’t recording anything though. Even with the reduced resolution, it’s easy to pick out “Mary had a little lamb” in the video below.
We’re blown away by this research, and we’re sure certain organizations will be looking into it for their own use. Don’t pull out your tin foil hats yet though. Foil containers proved to be one of the best sound reflectors.