Real Or Fake? Robot Uses AI To Find Waldo

The last few weeks have seen a number of tech sites reporting on a robot which can find and point out Waldo in those “Where’s Waldo” books. Designed and built by Redpepper, an ad agency. The robot arm is a UARM Metal, with a Raspberry Pi controlling the show.

A Logitech c525 webcam captures images, which are processed by the Pi with OpenCV, then sent to Google’s cloud-based AutoML Vision service. AutoML is trained with numerous images of Waldo, which are used to attempt a pattern match.  If a pattern is found, the coordinates are fed to PYUARM, and the UARM will literally point Waldo out.

While this is a totally plausible project, we have to admit a few things caught our jaundiced eye. The Logitech c525 has a field of view (FOV) of 69°. While we don’t have dimensions of the UARM Metal, it looks like the camera is less than a foot in the air. Amazon states that “Where’s Waldo Delux Edition” is 10″ x 0.2″ x 12.5″ inches. That means the open book will be 10″ x 25″. The robot is going to have a hard time imaging a surface that large in a single image. What’s more, the c525 is a 720p camera, so there isn’t a whole lot of pixel density to pattern match. Finally, there’s the rubber hand the robot uses to point out Waldo. Wouldn’t that hand block at least some of the camera’s view to the left?

We’re not going to jump out and call this one fake just yet — it is entirely possible that the robot took a mosaic of images and used that to pattern match. Redpepper may have used a bit of movie magic to make the process more interesting. What do you think? Let us know down in the comments!

26 thoughts on “Real Or Fake? Robot Uses AI To Find Waldo

  1. Love over engineering. They could have used a haar cascade or something like workflow no? Plus I totally agree the camera is going to struggle to provide enough feature information.

  2. I thin kyou are totally right!
    the hand totally blocks the left view of the camera, the resolution of the camera is way too low and the field of view is also too low.
    It looks like they did however have their neural network working in order, however that is super easy with Google’s AutoML:Vision beta.

  3. A few random thoughts:

    (1) The resolution at 0:23 is quite bad. But the resolution at 0:27 (which is presumably a zoomed-in version of 0:23) should be enough for feature detection. Perhaps the view at 0:23 is a poorly compressed image (blame YouTube?).

    (2) It seems weird for someone to make the effort of setting up the machine learning, buying the arm, etc. and then ruin the project through fraud.

    (3) I could only find the 0:58-second-long video. The project should really have better documentation! Perhaps there’s something weird, like another camera that takes the photo at 0:23 before the arm moves above the page?

    1. “(2) It seems weird for someone to make the effort of setting up the machine learning, buying the arm, etc. and then ruin the project through fraud.”

      If it is faked (I don’t have an opinion either way on that front) I would expect the reasoning to be: “We thought we could do this. We spent all this time and money trying to do this. It just doesn’t quite work out. So we’ll add this little bit of code that gives it a hint….”

  4. I would guess that it’s based on Tadej Magajna‘s FasterRCNN model trained for Wally-finding. You can install TensorFlow and google up his GitHub easily enough.

    You make a very valid point – half the challenge with ML computer vision systems has nothing to do with the neural network – it’s all in the image sensors, lenses, optics and lighting.

  5. It might be a proof-of-concept level. All the elements are there, just maybe not working together 100%.

    E.g. taking a high resolution picture of the book’s pages by hand, running that through OpenCV by hand, uploading the tens to hundreds faces to the google algorithm semi-automatically, showing the result and then just making a nice video of a robot arm pointing at some coordinates.

    It’s enough to show that the technology works. The steps in between could be automated, but the additional effort isn’t really justified for something that doesn’t have any real purpose.

  6. with these tools ( + “some” customization ;-), it should be possible
    But I would expect the pi to divide the image ( by moving the camera ) in parts to have good resolution on each part

    http://blog.dlib.net/2017/02/high-quality-face-recognition-with-deep.html

    https://medium.com/@ageitgey/machine-learning-is-fun-part-4-modern-face-recognition-with-deep-learning-c3cffc121d78

    http://blog.dlib.net/2017/02/high-quality-face-recognition-with-deep.html

  7. I doubt it’s a total fraud. Being shown as a fraud would completely negate the positives from having such a viral video.
    But… heavily edited to show only the most positive aspects? Most definitely.
    As others have mentioned, the camera and it’s placement cannot achieve high enough resolution images of the book pages. The obvious solution is multiple pictures stitched together – some type of page scanning routine. Clearly that’s within the reach of the hardware described, though not shown or mentioned.
    Then there’s time. Much of this video is shown in “montage” which obviously breaks the linear timeline. I can make no assumptions or judgments as to how long any part of this process takes. It could easily be hours.
    Calibration of the arm is another glossed over aspect. A hobby-servo driven arm with three-dimensional kinematics pointing to coordinates on a two dimensional picture. Solvable problems, but definitely non-trivial and completely glossed over in this video.
    What’s the success rate of the whole process? Image capture, processing, identification, locating and pointing – In the video it’s 100%, but how many failures did they edit out?

    Not “fake”, but much like every kickstarter video I’ve ever seen – much effort has been spent in the editing room to make this look as good as possible.

  8. why people keep assuming it performs the lookup on the entire page at once? no “stitching” required. take a shot, is it there? no? move right, try again. Make it look good by memorizing the position and jumping straight there on command. ‘s how humans do it, no?

  9. This Waldo game is rather poorly thought out. The characters are literally flat and there’s zero replayability. Once you’ve found Waldo on each page, that’s it.

    1. You’re looking for the Amazon comments section. Don’t worry! It is a common mistake. Just exit this tab, go to amazon.com, search the Waldo game, and click on “Write a customer review”.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.