[Dave Niewinski] clearly knows a thing or two about robots, judging from his YouTube channel. Usually the projects involve robot arms mounted on some sort of wheeled platform, but this time it’s the tune of some pretty famous yellow robot legs, in the shape of spot from Boston Dynamics. The premise is simple — tell the robot what snacks you want, entirely by voice command, and off he goes to fetch. But, we’re not talking about navigating to the fridge in the same room. We’re talking about trotting out the front door, down the street and crossing roads to visit favorite restaurant. Spot will order the snacks and bring them back, fully autonomously.
There are multiple things going here, all of which are pretty big computational tasks. Firstly, there is no cloud-based voice control, ala Google voice or Alexa. The robot works on the premise of full autonomy, which means no internet connectivity for any aspect. All voice recognition, voice-to-text, and speech synthesis are performed locally using the NVIDIA Riva GPU-based AI speech SDK, running on the local NVIDIA Jetson AGX Orin carried on Spot’s back. A front-facing webcam supplies the audio feed for this. The voice recognition application listens for the wake phrase, then turns the snack order into text, for later replay when it gets to the destination. Navigation is taken care of with a Microstrain RTK GNSS module, which has all the needed robustness, such as dual antennas, and inertial fallback for those regions with a spotty signal. Navigation is no use out in the real world on its own, which is where Spot’s depth sensor cameras come in. These enable local obstacle avoidance, as per the usual spot behavior we’ve all seen before. But what about crossing the road without getting tens of thousands of dollars of someone else’s hardware crushed by a passing truck? Spot’s onboard streaming cameras are fed into the NVIDIA dash cam net AI platform which enables real-time recognition of moving obstacles such as cars, humans and anything else that might be wandering around and get in the way. All in all a cool project showing the future potential of AI in robotics for important tasks, like fetching me a beer when I most need it, even if it comes from the local corner shop.
Today it is pretty easy to build a robot with an onboard camera and have fun manually driving through that first-person view. But builders with dreams of autonomy quickly learn there is a lot of work between camera installation and autonomously executing a “go to chair” command. Fortunately we can draw upon work such as View Parsing Network by [Bowen Pan, Jiankai Sun, et al]
When a camera image comes into a computer, it is merely a large array of numbers representing red, green, and blue color values and our robot has no idea what that image represents. Over the past years, computer vision researchers have found pretty good solutions for problems of image classification (“is there a chair?”) and segmentation (“which pixels correspond to the chair?”) While useful for building an online image search engine, this is not quite enough for robot navigation.
A robot needs to translate those pixel coordinates into real-world layout, and this is the problem View Parsing Network offers to solve. Detailed in Cross-view Semantic Segmentation for Sensing Surroundings (DOI 10.1109/LRA.2020.3004325) the system takes in multiple camera views looking all around the robot. Results of image segmentation are then synthesized into a 2D top-down segmented map of the robot’s surroundings. (“Where is the chair located?”)
The authors documented how to train a view parsing network in a virtual environment, and described the procedure to transfer a trained network to run on a physical robot. Today this process demands a significantly higher skill level than “download Arduino sketch” but we hope such modules will become more plug-and-play in the future for better and smarter robots.
The build is based around OpenPlotter, which uses a battery of marine-ready software to handle routing charts, autopiloting, and providing a compass heading for navigation. Naturally, it all runs on a Raspberry Pi. In combination with PyPilot, it can be used to let the vessel drive itself around a series of waypoints, allowing you to soak up the atmosphere on the water without having to constantly steer the craft.
[Timo] ran into some issues, however, with the hardware side of things. Existing implementations for motor control to drive the rudder weren’t quite cutting it, so the system was reworked to run with a robust H-bridge and some fresh Arduino code. This was combined with a custom rudder sensor built with a potentiometer and some 3D printed gears. Future work aims to double up the rudder sensors for redundancy, something we should all consider at times.
How does one go about programming a drone to fly itself through the real world to a location without crashing into something? This is a tough problem, made even tougher if you’re pushing speeds higher and high. But any article with “MIT” implies the problems being engineered are not trivial.
The folks over at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have put their considerable skill set to work in tackling this problem. And what they’ve come up with is (not surprisingly) quite clever: they’re embracing uncertainty.
Why Is Autonomous Navigation So Hard?
Suppose we task ourselves with building a robot that can insert a key into the ignition switch of a motor vehicle and start the engine, and could do so in roughly the same time-frame that a human could do — let’s say 10 seconds. It may not be an easy robot to create, but we can all agree that it is very doable. With foreknowledge of the coordinate information of the vehicle’s ignition switch relative to our robotic arm, we can place the key in the switch with 100% accuracy. But what if we wanted our robot to succeed in any car with a standard ignition switch?
Now the location of the ignition switch will vary slightly (and not so slightly) for each model of car. That means we’re going to have to deal with this in real time and develop our coordinate system on the fly. This would not be too much of an issue if we could slow down a little. But keeping the process limited to 10 seconds is extremely difficult, perhaps impossible. At some point, the amount of environment information and computation becomes so large that the task becomes digitally unwieldy.
This problem is analogous to autonomous navigation. The environment is always changing, so we need sensors to constantly monitor the state of the drone and its immediate surroundings. If the obstacles become too great, it creates another problem that lies in computational abilities… there is just too much information to process. The only solution is to slow the drone down. NanoMap is a new modeling method that breaks the artificial speed limit normally imposed with on-the-fly environment mapping.