Neural Style Transfer

Ever since we first saw the nightmarish artwork produced by Google DeepDream and the ridiculous faux paintings produced from neural style transfer, we’ve been aware of the ways machine learning can be applied to visual art. With commercially available trained models and automated pipelines for generating images from relatively small training sets, it’s now possible for developers without theoretical knowledge of machine learning to easily generate images, provided they have sufficient access to GPUs. Filmmaker [Kira Bursky] took this a step further, creating a surreal short film that features characters and textures produced from image sets.

She began with about 150 photos of her face, 200 photos of film locations, 4600 photos of past film productions, and 100 drawings as the main datasets.

Using GAN models for nebulas, faces, and skyscrapers in RunwayML, she found the results from training her face set disintegrated, realistic, and painterly. Many of the images continue to evoke aspects of her original face with distortions, although whether that is the model identifying a feature common to skyscrapers and faces or our own bias towards facial recognition is up to the viewer.

On the other hand, the results of training the film set photos on models of faces and bedrooms produced abstract textures and “surreal and eerie faces like a fever dream”. Perhaps, unlike the familiar anchors of facial features, it’s the lack of recognizable characteristics in the transformed images that gives them such a surreal feel.

[Kira] certainly uses these results to her advantage, brainstorming a concept for a short film that revolves around her main character experiencing nightmares. Although her objective was to use her results to convey a series of emotionally striking scenes, the models she uses to produce these scenes are also quite interesting.

She started off by using the MiDaS model, created by a team of researchers from ETH Zurich and Intel, for generating monocular depth maps. The results associated levels inside of an image with their appropriate depth in relation to one another. She also used the MASK R-CNN for masking out the backgrounds in generated faces and combined her generated images in Photoshop to create the main character for her short film.

In order to simulate the character walking, she used the Liquid Warping GAN, a framework for human motion imitation and appearance transfer, created by a team from ShanghaiTech University and Tencent AI Lab. This allowed her to take her original images and synthesize results from reference poses of herself going through the motions of walking by using a 3D body mesh recovery module. Later on, she applied similar techniques for motion tracking on her faces, running them through the First Order Motion Model to simulate different emotions. She went on to join her facial movements with her character using After Effects.

Bringing the results together, she animated a 3D camera blur using the depth map videos to create a less disorienting result by providing anchor points for the viewers and creating a displacement map to heighten the sense of depth and movement within the scenes. In After Effects, she also overlaid dust and film grain effects to give the final result a crisper look. The result is a surprisingly cinematic film entirely made of images and videos generated from machine learning models. With the help of the depth adjustments, it almost looks like something that you might see in a nightmare.

Check out the result below:

Continue reading “Creating Surreal Short Films From Machine Learning” →

With just two weeks to go before his friends’ wedding, [gistnoesis] built a well-featured robotic photo booth. Using a Bluetooth PS3 controller, guests could move the camera around, take a picture, style it in one of several ways (or not), and print it out with a single button press.

The camera is mounted on a DIY 2-axis gimbal made from extruded aluminium and 3D-printed parts. It can be moved left/right with one joystick, and up/down with the other. [gistnoesis] set up a four-panel split-screen display that shows the live feed from the camera and a diagram for the controls. The third panel shows the styled picture. Guests could explore the camera roll on the fourth panel.

LINN uses two PCs running Lubuntu, one of which is dedicated to running an open-source neural style transfer program. After someone takes a picture, they can change the style to make it look like a Van Gogh or Picasso before printing it out. A handful of wedding attendees knew about some of the extra features, like manual exposure control and the five-second timer option, and the information spread gradually. Not only was LINN a great conversation piece, it inspired multi-generational collaboration.

Despite the assembled size, LINN packs up nicely into a couple of reusable shopping bags for transport (minus the TV, of course). This vintage photo booth we saw a few years ago is more of a one-piece solution, although it isn’t as feature-rich.

Continue reading “Bluetooth Photo Booth Gets Vetting At Wedding” →