Over the past decade, we’ve seen great strides made in the area of AI and neural networks. When trained appropriately, they can be coaxed into generating impressive output, whether it be in text, images, or simply in classifying objects. There’s also much fun to be had in pushing them outside their prescribed operating region, as [Jon Warlick] attempted recently.
[Jon]’s work began using NVIDIA’s GauGAN tool. It’s capable of generating pseudo-photorealistic images of landscapes from segmentation maps, where different colors of a 2D image represent things such as trees, dirt, or mountains, or water. After spending much time toying with the software, [Jon] decided to see if it could be pressed into service to generate video instead.
The GauGAN tool is only capable of taking in a single segmentation map, and outputting a single image, so [Jon] had to get creative. Experiments were undertaken wherein a video was generated and exported as individual frames, with these frames fed to GauGAN as individual segmentation maps. The output frames from GauGAN were then reassembled into a video again.
The results are somewhat psychedelic, as one would expect. GauGAN’s single image workflow means there is only coincidental relevance between consecutive frames, creating a wild, shifting visage. While it’s not a technique we expect to see used for serious purposes anytime soon, it’s a great experiment at seeing how far the technology can be pushed. It’s not the first time we’ve seen such technology used to create full motion video, either. Video after the break.
Ever since we first saw the nightmarish artwork produced by Google DeepDream and the ridiculous faux paintings produced from neural style transfer, we’ve been aware of the ways machine learning can be applied to visual art. With commercially available trained models and automated pipelines for generating images from relatively small training sets, it’s now possible for developers without theoretical knowledge of machine learning to easily generate images, provided they have sufficient access to GPUs. Filmmaker [Kira Bursky] took this a step further, creating a surreal short film that features characters and textures produced from image sets.
She began with about 150 photos of her face, 200 photos of film locations, 4600 photos of past film productions, and 100 drawings as the main datasets.
Using GAN models for nebulas, faces, and skyscrapers in RunwayML, she found the results from training her face set disintegrated, realistic, and painterly. Many of the images continue to evoke aspects of her original face with distortions, although whether that is the model identifying a feature common to skyscrapers and faces or our own bias towards facial recognition is up to the viewer.
On the other hand, the results of training the film set photos on models of faces and bedrooms produced abstract textures and “surreal and eerie faces like a fever dream”. Perhaps, unlike the familiar anchors of facial features, it’s the lack of recognizable characteristics in the transformed images that gives them such a surreal feel.
[Kira] certainly uses these results to her advantage, brainstorming a concept for a short film that revolves around her main character experiencing nightmares. Although her objective was to use her results to convey a series of emotionally striking scenes, the models she uses to produce these scenes are also quite interesting.
She started off by using the MiDaS model, created by a team of researchers from ETH Zurich and Intel, for generating monocular depth maps. The results associated levels inside of an image with their appropriate depth in relation to one another. She also used the MASK R-CNN for masking out the backgrounds in generated faces and combined her generated images in Photoshop to create the main character for her short film.
In order to simulate the character walking, she used the Liquid Warping GAN, a framework for human motion imitation and appearance transfer, created by a team from ShanghaiTech University and Tencent AI Lab. This allowed her to take her original images and synthesize results from reference poses of herself going through the motions of walking by using a 3D body mesh recovery module. Later on, she applied similar techniques for motion tracking on her faces, running them through the First Order Motion Model to simulate different emotions. She went on to join her facial movements with her character using After Effects.
Bringing the results together, she animated a 3D camera blur using the depth map videos to create a less disorienting result by providing anchor points for the viewers and creating a displacement map to heighten the sense of depth and movement within the scenes. In After Effects, she also overlaid dust and film grain effects to give the final result a crisper look. The result is a surprisingly cinematic film entirely made of images and videos generated from machine learning models. With the help of the depth adjustments, it almost looks like something that you might see in a nightmare.
Ever since [Ian Goodfellow] and his colleagues invented the generative adversarial network (GAN) in 2014, hundreds of projects, from style transfers to poetry generators, have been produced using the concept of contesting neural networks. Unlike traditional neural networks, GANs can generate new data that fits statistically within the same set as the training set.
[Bernat Cuni], the one-man design team behind [cunicode] came up with the idea to generate beetles using this technique. Inspired by material published on Machine Learning for Artists, he decided to deploy some visual experiments with zoological illustrations. The training data was found from a public domain book hosted at archive.org, found through the Biodiversity Heritage Library. A combination of OpenCV and ImageMagick helped with individually extracting illustrations to squared images.
[Cuni] then ran a DCGAN with the data set, generating the first set of quasi-beetles after some tinkering with epochs and settings. After the failed first experiment, he went with StyleGAN, setting up a machine at PaperSpace with 1 GPU and running the training for >3 days on 128 px images. The results were much better, but fairly small and the cost of running the machine was quite expensive (>€125).
Given the success of the previous experiment, he decided to transfer over to Google CoLab, using their 12 hours of K80 GPU per run for free to generate some more beetles. With the intent on producing more HD beetles, he used Runway trained on 1024 px beetles, discovering much better results after 3000 steps. The model was moved over to Google CoLab to produce HD outputs.
The work is centered around the use of Generative Adversarial Networks, or GANs. [Helena] describes using a GAN to create artworks as a sort of game. An apprentice attempts to create new works in the style of their established master, while a critic attempts to determine whether the artworks are created by the master or the apprentice. As the apprentice improves, the critic must become more discerning; as the critic becomes more discerning, the apprentice must improve further. It is through this mechanism that the model improves itself.
[Helena] has spent time experimenting with CycleGAN in the artistic realm after first using it in a work project, and has primarily trained it on her own original artworks to create new pieces with wild and exciting results. She shares several tips on how best to work with the technology, around the necessary computing and storage requirements, as well as ways to step out of the box to create more diverse outputs.