The review embargo is finally over and we can share what we found in the Nvidia Jetson TX2. It’s fast. It’s very fast. While the intended use for the TX2 may be a bit niche for someone building one-off prototypes, there’s a lot of promise here for some very interesting applications.
Last week, Nvidia announced the Jetson TX2, a high-performance single board computer designed to be the brains of self-driving cars, selfie-snapping drones, Alexa-like bots for the privacy-minded, and other applications that require a lot of processing on a significant power budget.
This is the follow-up to the Nvidia Jetson TX1. Since the release of the TX1, Nvidia has made some great strides. Now we have Pascal GPUs, and there’s never been a better time to buy a graphics card. Deep learning is a hot topic that every new CS grad wants to get into, and that means racks filled with GPUs and CUDA cores. The Jetson TX1 and TX2 are Nvidia’s strike at embedded deep learning, or devices that need a lot of processing power without sucking batteries dry.
The last year has been great for Nvidia hardware. Nvidia released a graphics card using the Pascal architecture, 1080s are heating up server rooms the world over, and now Nvidia is making yet another move at high-performance, low-power computing. Today, Nvidia announced the Jetson TX2, a credit-card sized module that brings deep learning to the embedded world.
The Jetson TX2 is the follow up to the Jetson TX1. We took a look at it when it was released at the end of 2015, and the feelings were positive with a few caveats. The TX1 is still a very fast, very capable, very low power ARM device that runs Linux. It’s low power, too. The case Nvidia was trying to make for the TX1 wasn’t well communicated, though. This is ultimately a device you attach several cameras to and run OpenCV. This is a machine learning module. Now it appears Nvidia has the sales pitch for their embedded platform down.
After finding the infamous Heartbleed vulnerability along with a variety of other zero days, Google decided to form a full-time team dedicated to finding similar vulnerabilities. That team, dubbed Project Zero, just released a new vulnerability, and this one’s particularly graphic, consisting of a group of flaws in the Windows Nvidia Driver.
Most of the vulnerabilities found were due to poor programming techniques. From writing to user provided pointers blindly, to incorrect bounds checking, most vulnerabilities were due to simple mistakes that were quickly fixed by Nvidia. As the author put it, Nvidia’s “drivers contained a lot of code which probably shouldn’t be in the kernel, and most of the bugs discovered were very basic mistakes.”
When even our mice aren’t safe it may seem that a secure system is unattainable. However, there is light at the end of the tunnel. While the bugs found showed that Nvidia has a lot of work to do, their response to Google was “quick and positive.” Most bugs were fixed well under the deadline, and google reports that Nvidia has been finding some bugs on their own. It also appears that Nvidia is working on re-architecturing their kernel drivers for security. This isn’t the first time we’ve heard from Google’s Project Zero, and in all honesty, it probably won’t be last.
[Larry] has done this sort of thing before with Amazon’s EC2, but recently Microsoft has been offering a beta access to some of NVIDIA’s Tesla M60 graphics cards. As long as you have a fairly beefy connection that can support 30 Mbps of streaming data, you can play just about any imaginable game at 60fps on the ultimate settings.
It takes a bit of configuration magic and quite a few different utilities to get it all going, but in the end [Larry] is able to play Overwatch on max settings at a nice 60fps for $1.56 an hour. Considering that just buying the graphics card alone will set you back 2500 hours of play time, for the casual gamer, this is a great deal.
It’s interesting to see computers start to become a rentable resource. People have been attempting streaming computers for a while now, but this one is seriously impressive. With such a powerful graphics card you could use this for anything intensive, need a super high-powered video editing station for a day or two? A CAD station to make anyone jealous? Just pay a few dollars of cloud time and get to it!
The pedagogical model of the integrated circuit goes something like this: take a silicone wafer, etch out a few wells, dope some of the silicon with phosphorous, mask some of the chip off, dope some more silicon with boron, and lay down some metal in between everything. That’s an extraordinarily basic model of how the modern semiconductor plant works, but it’s not terribly inaccurate. The conclusion anyone would make after learning this is that chips are inherently three-dimensional devices. But the layers are exceedingly small, and the overall thickness of the active layers of a chip are thinner than a human hair. A bit of study and thought and you’ll realize the structure of an integrated circuit really isn’t in three dimensions.
Recently, rumors and educated guesses coming from silicon insiders have pointed towards true three-dimensional chips as the future of the industry. These chips aren’t a few layers thick like the example above. Instead of just a few dozen layers, 100 or more layers of transistors will be crammed into a single piece of silicon. The reasons for this transition range from shortening the distance signals must travel, reducing resistance (and therefore heat), and optimizing performance and power in a single design.
The ideas that are influencing the current generation of three-dimensional chips aren’t new; these concepts have been around since the beginnings of the semiconductor industry. What is new is how these devices will eventually make it to market, the challenges currently being faced at Intel and other semiconductor companies, and what it will mean for a generation of chips several years down the road.
Last week, the Nvidia Jetson TX1 was released. This credit card-sized module is a ‘supercomputer’ advertised as having more processing power than the latest Intel Core i7s, while running at under 10 Watts. This is supposedly the device that will power the next generation of things, using technologies unheard of in the embedded world.
A modern day smartphone could have been built 10 or 15 years ago. There’s no question the processing power was there with laptop CPUs, and the tiny mechanical hard drives in the original iPod was more than spacious enough to hold a library of Napster’d MP3s and all your phone contacts. The battery for this sesquidecadal smartphone, on the other hand, was impossible. The future depends on batteries and consequently low power computing. Is the Jetson TX1 the board that will deliver us into the future? It took a hands-on look to find out.
What is the TX1
The Jetson TX1 is a tiny module – 50x87mm – encased in a heat sink that brings the volume to about the same size as a pack of cigarettes. Underneath a block of aluminum is an Nvidia Tegra X1, a module that combines a 64-bit quad-core ARM Cortex-A57 CPU with a 256-core Maxwell GPU. The module is equipped with 4GB of LPDDR4-3200, 16GB of eMMC Flash, 802.11ac WiFi, and Bluetooth.
This module connects to the outside world through a 400-pin connector (from Samtec, a company quite liberal with product samples, by the way) that provides six CSI outputs for a half-dozen Raspberry Pi-style cameras, two DSI outputs, 1 eDP 1.4, 1 eDP 1.2, and HDMI 2.0 for displays. Storage is provided through either SD cards or SATA. Other ports include three USB 3.0, three USB 2.0, Gigabit Ethernet, a PCIe x1 and PCIe x4, and a host of GPIOs, UARTs, SPI and I2C busses.
The only way of getting at all these extra ports is, at the moment, the Jetson TX1 carrier board, a board that is effectively a MiniITX motherboard. Mount this carrier board in a case, modify a power supply and figure out how to wire up the front panel buttons, and you’ll have a respectable desktop computer.
This is not a desktop computer, though, and it’s not a replacement for a Raspberry Pi or Beaglebone. This is an engineering tool – a device built to handle the advanced robotics work of the future.
Benchmarks
No tech review would be complete without benchmarks, and since this is an Nvidia board, that means a deep dive into the graphics performance.
The review unit Nvidia sent over came with an incredible amount of documentation, pointing me towards GFXBench 4.0 Manhattan 3.1 (and the T-rex one) to test the graphics performance.
In terms of graphics performance, the TX1 isn’t that much different from a run-of-the-mill mobile chipset from a few years ago. This is to be expected; it’s unreasonable to expect Nvidia to put a Titan in a 10 Watt module; the Titan itself sucks up about 250 Watts.
What about CPU performance? The ARM Cortex A57 isn’t seen very much in tiny credit-card sized dev boards, but there are a few actual products out there with it. The TX1 isn’t a powerhouse by any means, but it does trounce the Raspberry Pi 2 Model B in testing by a factor of about three.
Compared to desktop/x86 performance, the best benchmarks again put the Nvidia TX1 in the same territory as a middling desktop from a few years ago. Still, that desktop probably draws about 300 W total, where the TX1 sips a meager 10 W.
This is not the board you want if you’re mining Bitcoins, and it’s not the board you should use if you need a powerful, portable device that can connect to anything. It’s for custom designs. The Nvidia TX1 is a module that’s meant to be integrated into products. It’s not a board for ‘makers’ and it’s not designed to be. It’s a board for engineers that need enough power in a reasonably small package that doesn’t drain batteries.
With an ARM Cortex A57 quad core running at almost 2 GHz, 4 GB of RAM, and a reasonably powerful graphics card for the power budget, the Nvidia TX1 is far beyond the usual tiny Linux boards. It’s far beyond the Raspi, the newest Beagleboard, and gives the Intel NUC boards a run for their money.
In terms of absolute power, the TX1 is about as powerful as a entry-level laptop from three or four years ago.
The Jetson TX1 is all about performance per Watt. That’s exceptional, new, and exciting; it’s something that simply hasn’t been done before. If you believe the reams of technical documents Nvidia granted me access to, it’s the first step to a world of truly smart embedded devices that have a grasp on computer vision, machine learning, and a bunch of other stuff that hasn’t really found its way into the embedded world yet.
And here lies the problem with the Jetson TX1; because a platform like this hasn’t been available before, the development stack, examples, and community of users simply isn’t there yet. The number of people contributing to the Nvidia embedded systems forum is tiny – our Hackaday articles get more comments than a thread on the Nvidia forums. Like all new platforms, the only thing missing is the community, putting Nvidia in a chicken and egg scenario.
This a platform for engineers. Specifically, engineers who are building autonomous golf carts and cars, quadcopters that follow you around, and robots that could pass a Turing test for at least 30 seconds. It’s an incredible piece of hardware, but not one designed to be a computer that sits next to a TV. The TX1 is an engineering tool that’s meant to go into other devices.
Alternative Applications, Like Gamecube
With that said, there are a few very interesting applications I could see the TX1 being used for. My car needs a new head unit, and building one with the TX1 would future proof it for at least another 200,000 miles. For the very highly skilled amateur engineers, the TX1 module opens a lot of doors. Six webcams is something a lot of artists would probably like to experiment with, and two DSI outputs – and a graphics card – would allow for some very interesting user interfaces.
That said, the TX1 carrier board is not the breakout board for these applications. I’d like to see something like what Sparkfun put together for the Intel Edison – dozens of breakout boards for every imaginable use case. The PCB files for the TX1 carrier board are available through the Nvidia developer’s portal (hope you like OrCAD), and Samtec, the supplier for the 400-pin connector used for the module, is exceedingly easy to work with. It’s not unreasonable for someone with a reflow toaster oven to create a breakout for the TX1 that’s far more convenient than a Mini-ITX motherboard.
Right now there aren’t many computers with ARM processors and this amount of horsepower out now. Impressively powerful ARM boards, such as the new BeagleBoard X15 and those that follow the 96Boards specification exist, but these do not have a modern graphics card baked into the module.
Without someone out there doing the grunt work of making applications with mass appeal work with the TX1, it’s impossible to say how well this board performs at emulating a GameCube, or any other general purpose application. The hardware is probably there, but the reviewers for the TX1 have been given less than a week to StackOverflow their way through a compatible build for the most demanding applications this board wasn’t designed for.
It’s all about efficiency
Is the TX1 a ‘supercomputer on a module’? Yes, and no. While it does perform reasonably well at machine learning tasks compared to the latest core-i7 CPUs, the Alexnet machine learning tasks are a task best suited for GPUs. It’s like asking which flies better: a Cessna 172 or a Bugatti Veyron? The Cessna is by far the better flying machine, but if you’re looking for a ‘supercomputer’, you might want to look at a 747 or C-5 Galaxy.
On the other hand, there aren’t many boards or modules out there at the intersection of high-powered ARM boards with a GPU and on a 10 Watt power budget. It’s something that’s needed to build the machines, robots, and autonomous devices of the future. But even then it’s still a niche product.
I can’t wait to see a community pop up around the TX1. With a few phone calls to Samtec, a few hours in KiCad, and a group buy for the module itself ($299 USD in 1000 unit quantities), this could be the start of something very, very interesting.
Today, Nvidia announced their latest platform for advanced technology in autonomous machines. They’re calling it the Jetson TX1, and it puts modern GPU hardware in a small and power efficient module. Why would anyone want GPUs in an embedded format? It’s not about frames per second; instead, Nvidia is focusing on high performance computing tasks – specifically computer vision and classification – in a platform that uses under 10 Watts.
For the last several years, tiny credit card sized ARM computers have flooded the market. While these Raspberry Pis, BeagleBones, and router-based dev boards are great for running Linux, they’re not exactly very powerful. x86 boards also exist, but again, these are lowly Atoms and other Intel embedded processors. These aren’t the boards you want for computationally heavy tasks. There simply aren’t many options out there for high performance computing on low-power hardware.
Tiny ARM computers the size of a credit card have served us all well for general computing tasks, and this leads to the obvious question – what is the purpose of putting so much horsepower on such a small board. The answer, at least according to Nvidia, is drones, autonomous vehicles, and image classification.
Image classification is one of the most computationally intense tasks out there, but for autonomous robots, there’s no other way to tell the difference between a cyclist and a mailbox. To do this on an embedded platform, you either need to bring a powerful general purpose CPU that sucks down 60 or so Watts, or build a smaller, more efficient GPU-based solution that sips a meager 10 Watts.