OpenCV Brings Pinch To Zoom Into The Real World

Gesture controls arrived in the public consciousness a little over a decade ago as touchpads and touchscreens became more popular. The main limitation to gesture controls, a least as far as [Norbert] is concerned, is that they can only control objects in a virtual space. He was hoping to use gestures to control a real-world object instead, and created this device which uses gestures to control an actual picture.

In this unique augmented reality device, not only is the object being controlled in the real world but the gestures are being monitored there as well, thanks to a computer vision system watching his hand which is running OpenCV. The position data is fed into an algorithm which controls a physical picture mounted on a slender robotic arm. Now, when [Norbert] “pinches to zoom”, the servo attached to the picture physically brings it closer to or further from his field of view. He can also use other gestures to move the picture around.

While this gesture-controlled machine is certainly a proof-of-concept, there are plenty of other uses for gesture controls of real-world objects. Any robotics platform could benefit from an interface like this, or even something slightly more mundane like an office PowerPoint presentation. Opportunity abounds, but if you need a primer for OpenCV take a look at this build which tracks a hand in minute detail.

11 thoughts on “OpenCV Brings Pinch To Zoom Into The Real World

  1. The problem I have with these kinds of interfaces is that you really need to have the machine know that you are intending to talk to it. If the businessman running his powerpoint presentation starts gesturing with his hands to his audience, what is to keep powerpoint from suddenly closing, or giving some other undesired behavior? We already have interfaces that behave this way. Take speech to text assistants. How many times have you heard someone’s google assistant or siri start talking to them unexpectedly? Gestures could be even worse, especially in sensitive applications where mistakes are hazardous.

    1. Maybe eye tracking or UWB could provide additional context clues for the control algorithms? It’s not a given that voice or gesture control is always inaccurate, just an engineering challenge.

  2. One of the selling points of “cobots” – small robots for use in assembly/manufacturing and distribution working alongside people – is the ability to learn by manually moving the manipulator head through the desired operation and having the device remember (and often optimize) that operation. Using a vision system to simply demonstrate the operation (“pick this up, put it there, but absolutely don’t hit that thing…”) would be an interesting embellishment.

    1. This is a hard problem for machine vision and an easy problem for a 3d position sensor, do it the easy way. Machine vision technology is mature and stable for 2d applications, not so much for 3d.

Leave a Reply to MichaelCancel reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.