Creating A Twisted Grid Image Illusion With A Diffusion Model

Images that can be interpreted in a variety of ways have existed for many decades, with the classical example being Rubin’s vase — which some viewers see as a vase, and others a pair of human faces.

When the duck becomes a bunny, if you ignore the graphical glitches that used to be part of the duck. (Credit: Steve Mould, YouTube)
When the duck becomes a bunny, if you ignore the graphical glitches that used to be part of the duck. (Credit: Steve Mould, YouTube)

Where things get trickier is if you want to create an image that changes into something else that looks realistic when you rotate each section of it within a 3×3 grid. In a video by [Steve Mould], he explains how this can be accomplished, by using a diffusion model to identify similar characteristics of two images and to create an output image that effectively contains essential features of both images.

Naturally, this process can be done by hand too, with the goal always being to create a plausible image in either orientation that has enough detail to trick the brain into filling in the details. To head down the path of interpreting what the eye sees as a duck, a bunny, a vase or the outline of faces.

Using a diffusion model to create such illusions is quite a natural fit, as it works with filling in noise until a plausible enough image begins to appear. Of course, whether it is a viable image is ultimately not determined by the model, but by the viewer, as humans are susceptible to such illusions while machine vision still struggles to distinguish a cat from a loaf and a raisin bun from a spotted dog. The imperfections of diffusion models would seem to be a benefit here, as it will happily churn through abstractions and iterations with no understanding or interpretive bias, while the human can steer it towards a viable interpretation.

4 thoughts on “Creating A Twisted Grid Image Illusion With A Diffusion Model

  1. with the classical example being Rubin’s vase — which some viewers see as a vase, and others a pair of human faces.

    Not quite.

    https://en.wikipedia.org/wiki/Rubin_vase

    Any individual can see both figures (vase or faces) but never both at the same time. If you see the vase, the faces disappear. If you change your focus to see the faces, the vase disappears.

    For me, such images tend to flip-flop – first one interpretation is visible, then the other. Such images are slightly irritating because they are never still – they keep morphing between the possible interpretations.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.