Images that can be interpreted in a variety of ways have existed for many decades, with the classical example being Rubin’s vase — which some viewers see as a vase, and others a pair of human faces.
Where things get trickier is if you want to create an image that changes into something else that looks realistic when you rotate each section of it within a 3×3 grid. In a video by [Steve Mould], he explains how this can be accomplished, by using a diffusion model to identify similar characteristics of two images and to create an output image that effectively contains essential features of both images.
Naturally, this process can be done by hand too, with the goal always being to create a plausible image in either orientation that has enough detail to trick the brain into filling in the details. To head down the path of interpreting what the eye sees as a duck, a bunny, a vase or the outline of faces.
Using a diffusion model to create such illusions is quite a natural fit, as it works with filling in noise until a plausible enough image begins to appear. Of course, whether it is a viable image is ultimately not determined by the model, but by the viewer, as humans are susceptible to such illusions while machine vision still struggles to distinguish a cat from a loaf and a raisin bun from a spotted dog. The imperfections of diffusion models would seem to be a benefit here, as it will happily churn through abstractions and iterations with no understanding or interpretive bias, while the human can steer it towards a viable interpretation.
Is this (i.e. using the downside of something to make a new thing) not how many scientific discoveries were made? (i.e. the pacemaker which was accidentally inverted?)
Domnside? What the hell are you talking about?
By downside I mean the fact that “The imperfections of diffusion models would seem to be a benefit here”
Not quite.
https://en.wikipedia.org/wiki/Rubin_vase
Any individual can see both figures (vase or faces) but never both at the same time. If you see the vase, the faces disappear. If you change your focus to see the faces, the vase disappears.
For me, such images tend to flip-flop – first one interpretation is visible, then the other. Such images are slightly irritating because they are never still – they keep morphing between the possible interpretations.