Gemini 2.0 + Robotics = Slam Dunk?

A humanoid robot packs a lunch bag in the kitchen

Over on the Google blog [Joel Meares] explains how Google built the new family of Gemini Robotics models.

The bi-arm ALOHA robot equipped with Gemini 2.0 software can take general instructions and then respond dynamically to its environment as it carries out its tasks. This family of robots aims to be highly dexterous, interactive, and general-purpose by applying the sort of non-task-specific training methods that have worked so well with LLMs, and applying them to robot tasks.

There are two things we here at Hackaday are wondering. Is there anything a robot will never do? And just how cherry-picked are these examples in the slick video? Let us know what you think in the comments!

37 thoughts on “Gemini 2.0 + Robotics = Slam Dunk?

  1. I feel like, almost by definition, robots will never engage in sexual reproduction (otherwise they’d be artificial life forms). For that matter I feel like robots probably won’t do anything for fun / just for the hell of it (because if they’ve been programmed or trained to do something “spontaneous” or illogical they’ll just be following their programming/minimizing their cost metric rather than truly doing something for fun).

    1. I think that a Phillip K. Dick-style autofac where robots produce themselves would still be considered a collection of robots, not artificial life… Even though it might technically meet the requirements, depending on your definition (and definitions of that word are infamously tricky).

      I doubt the design would be a single or pair of robots which can reproduce themselves independently; that makes sense for (some) biological organisms but not for robots. They would come off factory lines as usual, built mostly by machines as usual, just without human oversight.

      We could probably have that in several years if we tried. A factory which could build all its constituent parts without much human intervention.

  2. Interesting concept, but you really can’t determine anything from the video. We’re just taking their word that these tasks are not tailored to the robot. The lack of discussion of “reasoning” capability is also suspicious. That’s what’s currently lacking from LLM. I don’t doubt that it can understand speech or move an arm. I do doubt that it can see a physical problem, understand and solve it. Example, you’re trying to tie a shoe, and one of the laces is stuck inside and not visible.

    It would be more impressive if they did a livestream with some engineering youtuber or something and have them give prompts live, and let us watch what it does.

    1. Have you used Gemini Live, in the video stream mode from Project Astra? It has quite impressive apparent understanding of its environment. Integrating that capability with a motor sequencing system is tricky no doubt, but it seems they are doing it.

  3. I’m still wary of anything that comes out of google. After the multi-billion dollar robotics failure of “Project Replicant” that yielded nothing but wasted IP and the fact that the Gemini team has admitted to faking demos with no repercussions aside from angry press articles, I’d take anything you see with a skeptical eye.

  4. These (and the preceding ones from boston dynamics) have always been extremely staged. Note that their robots are never demo’d outside of video. No in-person demos where you only get one take. Always in a completely controlled environment where the items involved, their initial positions, and even the lighting in under the lab’s control. We’re seeing the single miracle takes, which may will be completely scripted. There’s simply no evidence in any of the videos you could point to that can be used to prove otherwise.

    If you want to see what are for-sure the real capabilities, look no further than competitions like https://www.youtube.com/watch?v=g0TaYhjpOfo It’s an open-air public competition where you only get a limited number of attempts and everyone can see the whole thing. There are flops. There are software crashes. Fckups. Fall-overs. These are *real. But of course these public showings deflate the hype bubble, and have quietly disappeared.

    1. Obviously they’re going to use their best takes for their ads, and probably stretch the truth as much as legally possible. But I have interacted with Boston Dynamics machines. They are pretty impressive

      1. I was thinking more like picking, and planting crops that havent been successfully automated otherwise. We see a lot of focus on factory and warehouse labor replacement being demonstrated already, agriculture seems woefully underrepresented so far.

          1. yeah yeah theres a ton of this and that projects that poorly perform single tasks. Theres a TON of professional systems being developed in the AG industry as well. None of that applies to what Im interested in seeing, Humanoid AGBots being applied to real world tasks that we currently use humans to perform.

          1. Apples, pears, peaches, strawberries, blueberries, artichokes, asparagus, lettuce, tomatoes, saffron, vanilla, palm oil, and cacao are all known for their labor-intensive hand harvesting processes.

            Many other food crops that are mechanically processed have significant portions of their yield diverted to further processing due to damages caused by the labor saving equipment currently employed. Many more crops are harvested prematurely, when their immature state leaves them firmer and more durable, only to be artificially ripened with ethylene gas exposure resulting in a lower nutritive value then if it had been allowed to ripen naturally before being delicately harvested by human(oid) hands.

            The Unitree G1 is $16k. Optimus is supposedly going to cost $20-30K. The average annual salary for migrant farm workers is around $38,955. The difference between horticulture and agriculture is merely scale, which an endless and tireless pool of labor that only costs maintenance and utility after acquisition could solve.

  5. There are some skills that seem so every day but are probably almost impossible for robots. Like reach into a bag of groceries and pull out the cheese without being able to look in the bag. or reach into a pocket full of change and other doodads and pull out a quarter.

    But on the other hand every skill the robot learns is infinitely and instantly transferable to all other robots now and forever.

    So really the only thing keeping robots from skills is the quality of the sensors.

    1. Id argue robots could potentially do those tasks even better. Humans rely on a hazy sense of proprioception, The robot hand could easily be equipped with small finger cameras to actually see what they are doing.

    2. There are some skills that seem so every day but are probably almost impossible for robots. Like reach into a bag of groceries and pull out the cheese without being able to look in the bag. or reach into a pocket full of change and other doodads and pull out a quarter.

      What a ridiculous assertion. You need only equip a robotic hand with sensor feedback in order to accomplish tasks like this. It doesn’t even have to be particularly accurate, it just need to be able to determine the general shape of an object and how much resistance it has to being compressed (how firm it is). The issue of holding an object with the correct amount of force has had a significant amount of development. A humanoid robot would be worthless if it didn’t have a resistive sense which mean the hardware needed to accomplish the tasks you describe are already standard.

      1. Yes, but the pocket is cloth, so it’s moving; the doodads are interacting and getting in the way. And you have no vision, so as they move you need to interpret what position each item is in and where it’s likely to go when you interact with it and the items around it. It’s not so simple as you imply.

        I was watching the video and noted that the demos involved moving a single item to a single place, most of which involved dropping the items, not placing them. And all were video based.

  6. So here’s a question for HaD. All of these (looking at Boston Dynamics and China now) humanoid robots appear amazing. Ability to function without direction seems … obfuscated in many presentations, but whatever, it will get better.

    But isn’t the fundamental barrier for broad adoption outside of a controlled factory setting the energy density issue? How long can any of these function on any of the current battery technologies? I’ve tried to check and the answer is always vaguely a ”a couple of hours depending on what it does. But then you just swap out the batteries. ”

    I get that, but having a device in a farm field (or hospital or hotel or office complex or mine or construction site) that needs to swap out battery packs every two hours, requiring regularly stopping work and walking back to the charging station seems very self limiting to me.

    Yeah, humans need breaks too, but the theory is that these things are “better” than humans – they don’t rest, they don’t take coffee breaks blah blah blah. I just think some $100,000 robot that can, for example, only make up 3 or 4 hotel rooms (make bed, clean up, vacuum, clean bathroom, change sheets and towels … 15 to 20 min per room) before needing to swap out or charge is just not going to replace someone making 1/2 that who works as fast or faster without the same breaks.

    As I understand it, there are no viable electric tractors or similar farm equipment except for very small scale farms because when you need to get the crops planted or harvested, you can’t take 40 minute breaks every few hours. I assume manual farm labor is an even bigger issue.

    Really I’m just curious on thoughts (outside of situations where continuous power or is guaranteed).

    Mostly these humanoid robots feel to me like the “any day now” for GPT’s replacing broad swaths of professionals (teachers, doctors, etc.). My reaction is “yeah, maybe, but there are a couple of big hurdles you’re not talking about …”

    1. -Not needing to pay it makes up for a huge number of inconveniences.
      -A lot of situations they’d just be able to remain plugged in, or run off umbilicals to a generator at the worksite. Not all, but many.
      -If we continue using mass migration and poverty wages to patch up labor shortages, you are going to encounter political reaction and war. You just are. There is no avoiding it. People will eventually start slaughtering millions 20th-century-style again if you try and repair the modern global system by shuffling hundreds of millions of laborers across oceans and entire industries back across oceans in the opposite direction. And we tried scolding or lecturing this problem away; that doesn’t work.

      1. …Or the robot could just be a vehicle with an engine, like an automatic tractor with advanced manipulators. It doesn’t need to be humanoid or human-sized in most applications

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.