Hallucination is the erroneous perception of something that’s actually absent – or in other words: A possible interpretation of training data. Researchers from the MIT and the UMBC have developed and trained a generative-machine learning model that learns to generate tiny videos at random. The hallucination-like, 64×64 pixels small clips are somewhat plausible, but also a bit spooky.
The machine-learning model behind these artificial clips is capable of learning from unlabeled “in-the-wild” training videos and relies mostly on the temporal coherence of subsequent frames as well as the presence of a static background. It learns to disentangle foreground objects from the background and extracts the overall dynamics from the scenes. The trained model can then be used to generate new clips at random (as shown above), or from a static input image (as shown in pairs below).
Currently, the team limits the clips to a resolution of 64×64 pixels and 32 frames in duration in order to decrease the amount of required training data, which is still at 7 TB. Despite obvious deficiencies in terms of photorealism, the little clips have been judged “more realistic” than real clips by about 20 percent of the participants in a psychophysical study the team conducted. The code for the project (Torch7/LuaJIT) can already be found on GitHub, together with a pre-trained model. The project will also be shown in December at the 2016 NIPS conference.
The system works by processing a live NTSC feed of a ping pong game. The ball is painted a particular color to aid in detection, and the FPGAs that process the video can keep track of where the net is, how many times the ball bounces, and if the ball has been hit by a player. With all of this information, the system can keep track of the score of the game, which is displayed on a monitor near the table. Now, the players are free to concentrate on their game and don’t have to worry about keeping score!
This is a pretty impressive demonstration of FPGAs and video processing that has applications beyond just ping pong. What would you use it for? It’s always interesting to see what students are working on; core concepts from these experiments tend to make their way into their professional lives later on. Maybe they’ll even take this project to the next level and build an actual real, working ping pong robot to work with their scoring system!
Perforated rolls of paper, called piano rolls, are used to input songs into player pianos. The image above was taken from a YouTube video showing a player piano playing a Gershwin tune called Limehouse Nights. There’s no published sheet music for the song, so [Zulko] decided to use Python to transcribe it.
First off the video was downloaded from YouTube. This video was processed with MoviePy library to create a single image plotting the notes. Using a Fourier Transform, the horizontal spacing between notes was found. This allowed the image to be reduced so that one pixel corresponded with one key.
With that done, each column could be assigned to a specific note on the piano. That takes care of the pitches, but the note duration requires more processing. The Fourier Transform is applied again to determine the length of a quarter note. With this known, the notes can be quantized, and a note duration can be applied to each.
Once the duration and notes are known, it’s time to export sheet music. LilyPond, an open source language for music notation, was used. This converts ASCII text into a sheet music PDF. The final result is a playable score of the piece, which you can watch after the break.