Calculating three-dimensional position from two-dimensional projections are literal textbook examples in geometry, but those examples are the “assume a spherical cow” type of simplifications. Applicable only in an ideal world where the projections are made with mathematically perfect cameras at precisely known locations with infinite resolution. Making things work in the real world is a lot harder. But not only have [Jingtong Li, Jesse Murray et al.] worked through the math of tracking a drone’s 3D flight from 2D video, they’ve released their MultiViewUnsynch software on GitHub so we can all play with it.
Instead of laboratory grade optical instruments, the cameras used in these experiments are available at our local consumer electronics store. A table in their paper Reconstruction of 3D Flight Trajectories from Ad-Hoc Camera Networks (arXiv:2003.04784) listed several Huawei cell phone cameras, a few Sony digital cameras, and a GoPro 3. Video cameras don’t need to be placed in any particular arrangement, because positions are calculated from their video footage. Correlating overlapping footage from dissimilar cameras is a challenge all in itself, since these cameras record at varying framerates ranging from 25 to 59.94 frames per second. Furthermore, these cameras all have rolling shutters, which adds an extra variable as scanlines in a frame are taken at slightly different times. This is not an easy problem.
There is a lot of interest in tracking drone flights, especially those flying where they are not welcome. And not everyone have the budget for high-end equipment or the permission to emit electromagnetic signals. MultiViewUnsynch is not quite there yet, as it tracks a single target and video files were processed afterwards. The eventual goal is to evolve this capability to track multiple targets on live video, and hopefully help reduce frustrating public embarrassments.
Hallucination is the erroneous perception of something that’s actually absent – or in other words: A possible interpretation of training data. Researchers from the MIT and the UMBC have developed and trained a generative-machine learning model that learns to generate tiny videos at random. The hallucination-like, 64×64 pixels small clips are somewhat plausible, but also a bit spooky.
The machine-learning model behind these artificial clips is capable of learning from unlabeled “in-the-wild” training videos and relies mostly on the temporal coherence of subsequent frames as well as the presence of a static background. It learns to disentangle foreground objects from the background and extracts the overall dynamics from the scenes. The trained model can then be used to generate new clips at random (as shown above), or from a static input image (as shown in pairs below).
Currently, the team limits the clips to a resolution of 64×64 pixels and 32 frames in duration in order to decrease the amount of required training data, which is still at 7 TB. Despite obvious deficiencies in terms of photorealism, the little clips have been judged “more realistic” than real clips by about 20 percent of the participants in a psychophysical study the team conducted. The code for the project (Torch7/LuaJIT) can already be found on GitHub, together with a pre-trained model. The project will also be shown in December at the 2016 NIPS conference.
The system works by processing a live NTSC feed of a ping pong game. The ball is painted a particular color to aid in detection, and the FPGAs that process the video can keep track of where the net is, how many times the ball bounces, and if the ball has been hit by a player. With all of this information, the system can keep track of the score of the game, which is displayed on a monitor near the table. Now, the players are free to concentrate on their game and don’t have to worry about keeping score!
This is a pretty impressive demonstration of FPGAs and video processing that has applications beyond just ping pong. What would you use it for? It’s always interesting to see what students are working on; core concepts from these experiments tend to make their way into their professional lives later on. Maybe they’ll even take this project to the next level and build an actual real, working ping pong robot to work with their scoring system!
Perforated rolls of paper, called piano rolls, are used to input songs into player pianos. The image above was taken from a YouTube video showing a player piano playing a Gershwin tune called Limehouse Nights. There’s no published sheet music for the song, so [Zulko] decided to use Python to transcribe it.
First off the video was downloaded from YouTube. This video was processed with MoviePy library to create a single image plotting the notes. Using a Fourier Transform, the horizontal spacing between notes was found. This allowed the image to be reduced so that one pixel corresponded with one key.
With that done, each column could be assigned to a specific note on the piano. That takes care of the pitches, but the note duration requires more processing. The Fourier Transform is applied again to determine the length of a quarter note. With this known, the notes can be quantized, and a note duration can be applied to each.
Once the duration and notes are known, it’s time to export sheet music. LilyPond, an open source language for music notation, was used. This converts ASCII text into a sheet music PDF. The final result is a playable score of the piece, which you can watch after the break.