Re-imagining Telepresence With Humanoid Robots And VR Headsets

Don’t let the name of the Open-TeleVision project fool you; it’s a framework for improving telepresence and making robotic teleoperation far more intuitive than it otherwise would be. It accomplishes this in part by taking advantage of the remarkable technology packed into modern VR headsets like the Apple Vision Pro and Meta Quest. There are loads of videos on the project page, many of which demonstrate successful teleoperation across vast distances.

Teleoperation of robotic effectors typically takes some getting used to. The camera views are unusual, the limbs don’t move the same way arms do, and intuitive human things like looking around to get a sense of where everything is don’t translate well.

A stereo camera with gimbal streaming to a VR headset complete with head tracking seems like a very hackable design.

To address this, researches provided a user with a robot-mounted, real-time stereo video stream (through which the user can turn their head and look around normally) as well as mapping arm and hand movements to humanoid robotic counterparts. This provides the feedback to manipulate objects and perform tasks in a much more intuitive way. In short, when our eyes, bodies, and hands look and work more or less the way we expect, it turns out it’s far easier to perform tasks.

The research paper goes into detail about the different systems, but in essence, a stereo depth and RGB camera is perched with a 3D printed gimbal atop a humanoid robot frame like the Unitree H1 equipped with high dexterity hands. A VR headset takes care of displaying a real-time stereoscopic video stream and letting the user look around. Hand tracking for the user is mapped to the dexterous hands and fingers. This lets a person look at, manipulate, and handle things without in-depth training. Perhaps slower and more clumsily than they would like, but in an intuitive way all the same.

Interested in taking a closer look? The GitHub repository has the necessary code, and while most of us will never be mashing ADD TO CART on something like the Unitree H1, the reference design for a stereo camera streaming to a VR headset and mirroring head tracking with a two-motor gimbal looks like the sort of thing that would be useful for a telepresence project or two.

20 thoughts on “Re-imagining Telepresence With Humanoid Robots And VR Headsets

    1. Stolen from Google who probably generated it from text stolen from elsewhere:

      “Radio waves propagate in vacuum at the speed of light c, exactly 299,792,458 m/s. Propagation time to the Moon and back ranges from 2.4 to 2.7 seconds, with an average of 2.56 seconds (the average distance from Earth to the Moon is 384,400 km).”

      So…. not entirely unusable but it’s probably not going to be as easy to use one that is on the moon.

  1. Yeah, I need one of this but mobile, to crawl unde desks to replace/lay network cables, to carry computers to and from users, to replace printer toner, to be in the servers room to do maintenance, to carry parcels. It would be a nice thing for it to do it as I taking a nap.
    Boss, can I use it to do my shopping? Pretty pleaseeee?

    Not to mention that you cand strap a gun to it and all that movies with Arnold are having a chance to come to pass.

  2. Giving as an example a supermarket cashier, come on ! What great added value.
    It is pictured here just to show the possibilities of the project, that’s all.

    It was obviously developed for risk situations or dangerous environments, where the operator has great skills and long training and cannot be replaced easily… for example a specialized military.

    Also perfect to get rid of my mother-in-law remotely without leaving any clues. /s

    1. Oh now there would be a video… Coupon Karen vs Android Cashier…” What do you mean this coupon is expired ? Everyone else takes it !….I want to see your programmer ! “

  3. VSI….Life But Only Better. (Virtual Self Industries)
    In the near future, people live their lives free of pain, danger and complications through robotic representations of themselves…..Surrogates (Bruce Willis)

  4. Why have it pick up a barcode scanner when one could be integrated into the vision software or mounted on the robot (such as a wrist or chest)?

  5. I’m not sure where the ‘reimagining’ comes in, this is the same sort of telepresence and teleoperation that’s been experimented with for many decades (e.g. the old LEEP Telehead from the early 90s, the many waldo systems from the mid 20th century for radioactive material handling, etc).

    Typically the hardest problem to solve with teleoperation is the bidirectional feedback loop: you need to feed pose and haptic sensing both ways across the link without inducting hysteresis and ‘haptic hammering’, and without adding so much smoothing that it impacts latency.
    This work sidesteps that problem by just not attempting to solve it at all: there is no feedback at all, only unidirectional post replication. Pose errors and latency are handled by ignoring them and just operating the system really really slowly, then speeding up the video for the demo.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.