Real Or Fake? Robot Uses AI To Find Waldo

August 30, 2018

The last few weeks have seen a number of tech sites reporting on a robot which can find and point out Waldo in those “Where’s Waldo” books. Designed and built by Redpepper, an ad agency. The robot arm is a UARM Metal, with a Raspberry Pi controlling the show.

A Logitech c525 webcam captures images, which are processed by the Pi with OpenCV, then sent to Google’s cloud-based AutoML Vision service. AutoML is trained with numerous images of Waldo, which are used to attempt a pattern match. If a pattern is found, the coordinates are fed to PYUARM, and the UARM will literally point Waldo out.

While this is a totally plausible project, we have to admit a few things caught our jaundiced eye. The Logitech c525 has a field of view (FOV) of 69°. While we don’t have dimensions of the UARM Metal, it looks like the camera is less than a foot in the air. Amazon states that “Where’s Waldo Delux Edition” is 10″ x 0.2″ x 12.5″ inches. That means the open book will be 10″ x 25″. The robot is going to have a hard time imaging a surface that large in a single image. What’s more, the c525 is a 720p camera, so there isn’t a whole lot of pixel density to pattern match. Finally, there’s the rubber hand the robot uses to point out Waldo. Wouldn’t that hand block at least some of the camera’s view to the left?

We’re not going to jump out and call this one fake just yet — it is entirely possible that the robot took a mosaic of images and used that to pattern match. Redpepper may have used a bit of movie magic to make the process more interesting. What do you think? Let us know down in the comments!

26 thoughts on “Real Or Fake? Robot Uses AI To Find Waldo”

theonewhoshouldnotbenamed says:

August 30, 2018 at 4:12 am

an ad agency got a post on Hackaday.. the technicalities are the least of their concerns, their final objective is already accomplished!

Report comment

Reply
Mungojerry says:

August 30, 2018 at 4:35 am

Love over engineering. They could have used a haar cascade or something like workflow no? Plus I totally agree the camera is going to struggle to provide enough feature information.

Report comment

Reply
1. svofski says:
  
  August 30, 2018 at 7:03 am
  
  Have you tried designing a haar cascade from scratch?
  
  Report comment
  
  Reply
  1. Pez says:
    
    August 30, 2018 at 5:06 pm
    
    Seriously ..
    
    Report comment
    
    Reply
Arjan Wiegel says:

August 30, 2018 at 4:53 am

I thin kyou are totally right!
the hand totally blocks the left view of the camera, the resolution of the camera is way too low and the field of view is also too low.
It looks like they did however have their neural network working in order, however that is super easy with Google’s AutoML:Vision beta.

Report comment

Reply
Danny says:

August 30, 2018 at 5:17 am

A few random thoughts:

(1) The resolution at 0:23 is quite bad. But the resolution at 0:27 (which is presumably a zoomed-in version of 0:23) should be enough for feature detection. Perhaps the view at 0:23 is a poorly compressed image (blame YouTube?).

(2) It seems weird for someone to make the effort of setting up the machine learning, buying the arm, etc. and then ruin the project through fraud.

(3) I could only find the 0:58-second-long video. The project should really have better documentation! Perhaps there’s something weird, like another camera that takes the photo at 0:23 before the arm moves above the page?

Report comment

Reply
1. Zengar says:
  
  August 30, 2018 at 5:52 am
  
  “(2) It seems weird for someone to make the effort of setting up the machine learning, buying the arm, etc. and then ruin the project through fraud.”
  
  If it is faked (I don’t have an opinion either way on that front) I would expect the reasoning to be: “We thought we could do this. We spent all this time and money trying to do this. It just doesn’t quite work out. So we’ll add this little bit of code that gives it a hint….”
  
  Report comment
  
  Reply
  1. Danny says:
    
    August 30, 2018 at 8:05 am
    
    That reasoning seems very plausible! If it is faked (I’m also not sure), I would believe that it happened that way.
    
    Report comment
    
    Reply
Luke Weston says:

August 30, 2018 at 5:48 am

I would guess that it’s based on Tadej Magajna‘s FasterRCNN model trained for Wally-finding. You can install TensorFlow and google up his GitHub easily enough.

You make a very valid point – half the challenge with ML computer vision systems has nothing to do with the neural network – it’s all in the image sensors, lenses, optics and lighting.

Report comment

Reply
1. blazingeclipse says:
  
  August 30, 2018 at 6:21 am
  
  Only half? Garbage in is garbage out.
  
  Report comment
  
  Reply
  1. jon says:
    
    August 30, 2018 at 8:10 am
    
    Ironically “garbage in,” is half of the old saying “garbage in, garbage out.”
    
    Report comment
    
    Reply
2. Ostracus says:
  
  August 30, 2018 at 7:22 am
  
  Some. Covered in “Feature engineering for machine learning”, part of the current HB bundle.
  
  Report comment
  
  Reply
Izzy Mizziz Tata says:

August 30, 2018 at 5:55 am

Well it certainly seems artificial.. The intelligence part seems lacking though…

Report comment

Reply
doc_brown says:

August 30, 2018 at 6:20 am

It might be a proof-of-concept level. All the elements are there, just maybe not working together 100%.

E.g. taking a high resolution picture of the book’s pages by hand, running that through OpenCV by hand, uploading the tens to hundreds faces to the google algorithm semi-automatically, showing the result and then just making a nice video of a robot arm pointing at some coordinates.

It’s enough to show that the technology works. The steps in between could be automated, but the additional effort isn’t really justified for something that doesn’t have any real purpose.

Report comment

Reply
James says:

August 30, 2018 at 6:23 am

I think they add agency, giving the robot its ability to act within the environment? :)

(Sorry, hit report comment. Seriously, swap the reply and report buttons around!)

Report comment

Reply
1. STR Alorman says:
  
  August 30, 2018 at 7:14 am
  
  I like your version
  
  Report comment
  
  Reply
Megol says:

August 30, 2018 at 6:44 am

Shouldn’t that be “… an ad agency”?

Report comment

Reply
MrJBSwe says:

August 30, 2018 at 7:10 am

with these tools ( + “some” customization ;-), it should be possible
But I would expect the pi to divide the image ( by moving the camera ) in parts to have good resolution on each part

http://blog.dlib.net/2017/02/high-quality-face-recognition-with-deep.html

https://medium.com/@ageitgey/machine-learning-is-fun-part-4-modern-face-recognition-with-deep-learning-c3cffc121d78

http://blog.dlib.net/2017/02/high-quality-face-recognition-with-deep.html

Report comment

Reply
AndreN says:

August 30, 2018 at 7:10 am

The article misses what seems like an intentional joke – the term “waldo” is used to describe a manipulator arm.

Report comment

Reply
1. CircuitGizmos says:
  
  August 30, 2018 at 12:13 pm
  
  I thought I was the only one…
  
  Report comment
  
  Reply
DKE says:

August 30, 2018 at 8:32 am

I doubt it’s a total fraud. Being shown as a fraud would completely negate the positives from having such a viral video.
But… heavily edited to show only the most positive aspects? Most definitely.
As others have mentioned, the camera and it’s placement cannot achieve high enough resolution images of the book pages. The obvious solution is multiple pictures stitched together – some type of page scanning routine. Clearly that’s within the reach of the hardware described, though not shown or mentioned.
Then there’s time. Much of this video is shown in “montage” which obviously breaks the linear timeline. I can make no assumptions or judgments as to how long any part of this process takes. It could easily be hours.
Calibration of the arm is another glossed over aspect. A hobby-servo driven arm with three-dimensional kinematics pointing to coordinates on a two dimensional picture. Solvable problems, but definitely non-trivial and completely glossed over in this video.
What’s the success rate of the whole process? Image capture, processing, identification, locating and pointing – In the video it’s 100%, but how many failures did they edit out?

Not “fake”, but much like every kickstarter video I’ve ever seen – much effort has been spent in the editing room to make this look as good as possible.

Report comment

Reply
duh says:

August 30, 2018 at 12:11 pm

why people keep assuming it performs the lookup on the entire page at once? no “stitching” required. take a shot, is it there? no? move right, try again. Make it look good by memorizing the position and jumping straight there on command. ‘s how humans do it, no?

Report comment

Reply
Gregg Eshelman says:

August 30, 2018 at 1:24 pm

This Waldo game is rather poorly thought out. The characters are literally flat and there’s zero replayability. Once you’ve found Waldo on each page, that’s it.

Report comment

Reply
1. Fred says:
  
  August 30, 2018 at 6:44 pm
  
  You’re looking for the Amazon comments section. Don’t worry! It is a common mistake. Just exit this tab, go to amazon.com, search the Waldo game, and click on “Write a customer review”.
  
  Report comment
  
  Reply
BillThePlatypus says:

August 30, 2018 at 2:27 pm

If it was real, they edited out it finding the stamp with Waldo’s face repeatedly.

Report comment

Reply
ameyring says:

August 31, 2018 at 3:43 pm

Invite the inventor to a hackaday conference (especially near his home) to show/discuss his technology and if he declines, that may mean something fishy is up.

Report comment

Reply

Hackaday

Real Or Fake? Robot Uses AI To Find Waldo

26 thoughts on “Real Or Fake? Robot Uses AI To Find Waldo”

Leave a ReplyCancel reply

Search

Never miss a hack

If you missed it

MXM: Powerful, Misused, Hackable

VCF East 2024 Was Bigger And Better Than Ever

Microsoft Killed My Favorite Keyboard, And I’m Mad About It

Remembering Peter Higgs And The Gravity Of His Contributions To Physics

Chandra X-ray Observatory Threatened By Budget Cuts

Our Columns

Hackaday Podcast Episode 267: Metal Casting, Plasma Cutting, And A Spicy 555

This Week In Security: Putty Keys, Libarchive, And Palo Alto

Human-Interfacing Devices: HID Over I2C

Fail Of The Week: Can An Ultrasonic Cleaner Remove Bubbles From Resin?

Linux Fu: Stupid Systemd Tricks

26 thoughts on “Real Or Fake? Robot Uses AI To Find Waldo”

Leave a ReplyCancel reply

Search

Never miss a hack

Subscribe

If you missed it

Our Columns