It’s always fun to look over aerial and satellite maps of places we know, seeing a perspective different from our usual ground level view. We lose that context when it’s a place we don’t know by heart. Such as, say, Mars. So [Matthew Earl] sought to give Perseverance rover’s landing video some context by projecting onto orbital imagery from ESA’s Mars Express. The resulting video (embedded below the break) is a fun watch alongside the technical writeup Reprojecting the Perseverance landing footage onto satellite imagery.
Some telemetry of rover position and orientation were transmitted live during the landing process, with the rest recorded and downloaded later. Surprisingly, none of that information was used for this project, which was based entirely on video pixels. This makes the results even more impressive and the techniques more widely applicable to other projects. The foundational piece is SIFT (Scale Invariant Feature Transform), which is one of many tools in the OpenCV toolbox. SIFT found correlations between Perseverance’s video frames and Mars Express orbital image, feeding into a processing pipeline written in Python for results rendered in Blender.
While many elements of this project sound enticing for applications in robot vision, there are a few challenges touched upon in the “Final Touches” section of the writeup. The falling heatshield interfered with automated tracking, implying this process will need help to properly understand dynamically changing environments. Furthermore, it does not seem to run fast enough for a robot’s real-time needs. But at first glance, these problems are not fundamental. They merely await some motivated people to tackle in the future.
This process bears some superficial similarities to projection mapping, which is a category of projects we’ve featured on these pages. Except everything is reversed (camera instead of video projector, etc.) making the math an entirely different can of worms. But if projection mapping sounds more to your interest, here is a starting point.
[via Dr. Tanya Harrison @TanyaOfMars]
Continue reading “Putting Perseverance Rover’s View Into Satellite View Context”
A research paper from Dalian University of Technology in China and City University of Hong Kong (direct PDF link) outlines a system that automatically generates comic books from videos. But how can an algorithm boil down video scenes to appropriately reflect the gravity of the scene in a still image? This impressive feat is accomplished by saving two still images per second, then segments the frames into scenes through analysis of region-of-interest and importance ranking.
For its next trick, speech for each scene is processed by combining subtitle information with the audio track of the video. The audio is analyzed for emotion to determine the appropriate speech bubble type and size of the subtitle text. Frames are even analyzed to establish which person is speaking for proper placement of the bubbles. It can then create layouts of the keyframes, determining panel sizes for each page based on the region-of-interest analysis.
The process is completed by stylizing the keyframes with flat color through quantization, for that classic cel shading look, and then populating the layouts with each frame and word balloon.
The team conducted a study with 40 users, pitting their results against previous techniques which require more human intervention and still besting them in every measure. Like any great superhero, the team still sees room for improvement. In the future, they would like to improve the accuracy of keyframe selection and propose using a neural network to do so.
Thanks to [Qes] for the tip!
The world was never black and white – we simply lacked the technology to capture it in full color. Many have experimented with techniques to take black and white images, and colorize them. [Adrian Rosebrock] decided to put an AI on the job, with impressive results.
The method involves training a Convolutional Neural Network (CNN) on a large batch of photos, which have been converted to the Lab colorspace. In this colorspace, images are made up of 3 channels – lightness, a (red-green), and b (blue-yellow). This colorspace is used as it better corresponds to the nature of the human visual system than RGB. The model is then trained such that when given a lightness channel as an input, it can predict the likely a & b channels. These can then be recombined into a colorized image, and converted back to RGB for human consumption.
It’s a technique capable of doing a decent job on a wide variety of material. Things such as grass, countryside, and ocean are particularly well dealt with, however more complicated scenes can suffer from some aberration. Regardless, it’s a useful technique, and far less tedious than manual methods.
CNNs are doing other great things too, from naming tomatoes to helping out with home automation. Video after the break.
Continue reading “Colorizing Images With The Help Of AI”
One of the core lessons any physics student will come to realize is that the more you know about physics, the less intuitive it seems. Take the nature of light, for example. Is it a wave? A particle? Both? Neither? Whatever the answer to the question, scientists are at least able to exploit some of its characteristics, like its ability to bend and bounce off of obstacles. This camera, for example, is able to image a room without a direct light-of-sight as a result.
The process works by pointing a camera through an opening in the room and then strobing a laser at the exposed wall. The laser light bounces off of the wall, into the room, off of the objects on the hidden side of the room, and then back to the camera. This concept isn’t new, but the interesting thing that this group has done is lift the curtain on the image processing underpinnings. Before, the process required a research team and often the backing of the university, but this project shows off the technique using just a few lines of code.
This project’s page documents everything extensively, including all of the algorithms used for reconstructing an image of the room. And by the way, it’s not a simple 2D image, but a 3D model that the camera can capture. So there should be some good information for anyone working in the 3D modeling world as well.
Thanks to [Chris] for the tip!
As the cost of high-resolution images sensors gets lower, and the availability of small and cheap single board computers skyrockets, we are starting to see more astrophotography projects than ever before. When you can put a $5 Raspberry Pi Zero and a decent webcam outside in a box to take autonomous pictures of the sky all night, why not give it a shot? But in doing so, many hackers are recognizing a fact well-known to traditional telescope jockeys: seeing a few stars is easy, seeing a lot of stars is another story entirely.
The problem is that stars are fairly dim; a problem compounded by the light pollution you get unless you’re out in a rural area. You can’t just brighten up the images either, as that only increases the noise in the image. A programmer always in search of a challenge, [Benedikt Bitterli] decided to take a shot at using software to improve astrophotography images. He documented the entire process, failures and all, on his blog for anyone else who might be curious about what it really takes to create the incredible images of the night sky we see in textbooks.
In principle it’s simple: just take a lot of pictures of the sky, stack them on top of each other, and identify which points of light are stars and which ones are noise artifacts. But of course the execution is considerably more difficult. For one thing, unless the camera was on a mount that was automatically tracking the sky, the stars will have slightly moved in each image. To help with this process, [Benedikt] used a navigational trick that humanity has relied on for millennia: mapping constellations. By comparing groupings of stars in each image, his software is able to accurately overlay each image.
But that’s only one part of the equation. In his post, [Benedikt] goes over the incredible amount of math that goes into identifying individual stars in the sea of noise you get when a digital image sensor looks into the black. You certainly don’t need to understand all the math to appreciate the final results, but it’s a fascinating read for those with an interest in computer vision concepts.
This kind of software is precisely what you want to pair with your 3D printed star tracker, or even better a Raspberry Pi sky monitoring station.
[Thanks to Helio Machado for the tip.]
People with dementia have trouble with some of the things we take for granted, including dressing themselves. It can be a remarkably difficult task involving skills like balance, pattern recognition inside of other patterns, ordering, gross motor skill, and dexterity to name a few. Just because something is common, doesn’t mean it is easy. The good folks at NYU Rory Meyers College of Nursing, Arizona State University, and MGH Institute of Health Professions talked with a caregiver focus group to find a way for patients to regain their privacy and replace frustration with independence.
Although this is in the context of medical assistance, this represents one of the ways we can offload cognition or judgment to computers. The system works by detecting movement when someone approaches the dresser with five drawers. Vocal directions and green lights on the top drawer light up when it is time to open the drawer and don the clothing inside. Once the system detects the article is being worn appropriately, the next drawer’s light comes one. A camera seeks a matrix code on each piece of clothing, and if it times out, a caregiver is notified. There is no need for an internet connection, nor should one be given.
Currently, the system has a good track record with identifying the clothing, but it is not proficient at detecting when it is worn correctly, which could lead to frustrating false alarms. Matrix codes seemed like a logical choice since they could adhere to any article of clothing and get washed repeatedly but there has to be a more reliable way. Perhaps IR reflective threads could be sewn into clothing with varying stitch lengths, so the inside and outside patterns are inverted to detect when clothing is inside-out. Perhaps a combination of IR reflective and absorbing material could make large codes without being visible to the human eye. How would you make a machine-washable, machine-readable visual code?
Helping people with dementia is not easy but we are not afraid to start, like this music player. If matrix codes and barcodes get you moving, check out this hacked scrap-store barcode scanner.
Thank you, [Qes] for the tip.
There are quick hacks, there are weekend projects and then there are years long journeys towards completion. [Boris Vitazek]’s grafofon
falls into the latter category. His creation can best be described as electromechanical sequencer synthesizer with a multiplayer mode.
The storage medium and interface for this sequencer is a thirteen-meter loop of paper that is mounted like a conveyor belt. Music is composed by drawing on the paper or placing objects on it. This is usually done by the audience and the fact that the marker isn’t erased make the result collaborative and incremental.
These ‘scores’ are read by a camera and interpreted by software.This is a very vague description of this device, for a reason: the build went on over six years and both hard- and software went through several revisions in that time. It started as a trigger for MIDI notes and evolved from there.
In his write up [Boris] explains the technical aspects of each iteration. He also tells the stories of the people he met while working on the grafofon and how they influenced the build. If this look into the art world reminds you of your local hackerspace, it is because these worlds aren’t that far apart.
Continue reading “The Grafofon: An Optomechanical Sequencer”