The review embargo is finally over and we can share what we found in the Nvidia Jetson TX2. It’s fast. It’s very fast. While the intended use for the TX2 may be a bit niche for someone building one-off prototypes, there’s a lot of promise here for some very interesting applications.
Last week, Nvidia announced the Jetson TX2, a high-performance single board computer designed to be the brains of self-driving cars, selfie-snapping drones, Alexa-like bots for the privacy-minded, and other applications that require a lot of processing on a significant power budget.
This is the follow-up to the Nvidia Jetson TX1. Since the release of the TX1, Nvidia has made some great strides. Now we have Pascal GPUs, and there’s never been a better time to buy a graphics card. Deep learning is a hot topic that every new CS grad wants to get into, and that means racks filled with GPUs and CUDA cores. The Jetson TX1 and TX2 are Nvidia’s strike at embedded deep learning, or devices that need a lot of processing power without sucking batteries dry.
Wading Into High-End Single Board Computers
Before diving into this review, it’s a good time to place Nvidia’s embedded offerings in a historical context. The Nvidia TK1 was the first offering, launched in April of 2014. While this is still a capable single board computer, there are cheaper options now that are almost as good. If you don’t need the Kepler GPU found in the TK1, just grab a Pi or Beaglebone.
The Nvidia TX1 launched in November 2015. This board was a marked departure from the TK1. The TX1 is a credit card-sized module strapped to a heatsink. At the time, the TX1 was the best high-performance embedded Linux device you could buy. With a powerful quad-core ARM Cortex-A57 CPU coupled with a Maxwell GPU, the performance was great. Even today, the Nvidia TX1 has acceptable performance compared to its competition.
Shortly after the introduction of the TX1, Pine64 — the “world’s first 64-bit single board computer” — launched on Kickstarter. The release was a disaster and I can’t recommend a Pine64. A few months after the Pine64, the Raspberry Pi 3B was released, sporting a quad-core ARM Cortex A53. The Pi 3B is the first Pi that feels like a proper desktop computer. It’s fast enough for general computing and good enough for (light) heavy lifting.
In March 2016, the Odroid C2 came on the scene. Like the Pi 3B, it sported a quad A53. Again, it’s a passable desktop computer that is fast enough for general computing. Late last year, the Orange Pi released their cattywampus PC2, another quad-core A53 single board computer. All of these are acceptable single board computers whose performance would have astonished people in the year 2000.
For about 18 months, the world saw the release of dozens of ARM-based single board computers. Now we’ve pretty much reached the limit of what a small, low-power ARM Linux board can do. In a rare interview discussing the future of the Raspberry Pi, [Eben Upton] says we’re stuck at 40nm chips for a while. Until newer, faster chips with new architectures are available (and cheap), these are the fastest ARM/Linux single board computers you can buy.
These single board computers are great if all you need is a computer capable enough to handle a few scripts, serve up a few web pages, or play a few YouTube videos. If your use case involves video games, rendering video, or machine learning, you’ll need something more powerful. This is why the Nvidia Jetson TX1 and TX2 exist. Is it as fast as a desktop loaded up with an i7 and a GTX 1080? No, but that’s not the point — a desktop built around an i7 6700K and a GTX 1080 will draw at least 300 Watts, whereas the Jetson TX2 only draws fifteen at full bore.
The Jetson TX2
The TX2 is a tiny board bolted to a credit-card sized heat sink. That’s the heart of the TX2, but I suspect very few people will ever work with a bare TX2 module. I don’t even know if you can buy the TX2 module in one unit quantities. Instead of starting off with the module itself and the benchmarks therein, I’ll begin with the TX2 Developer Kit.
The Jetson TX2 Developer Kit is basically a Mini-ITX motherboard. That’s a great form factor for a dev kit, and follows in the footsteps of the Jetson TX1. Very little has changed between the TX1 and TX2 Developer Kits.
For anyone who is already using the Jetson TX1, the TX2 will be a drop-in replacement. Additionally, Nvidia will continue to support the TX1, they’re not EOL’ing the TX1, and there will be a reduction in price of the TX1. Depending on how much of a price reduction we see, I would highly recommend the TX1 for anyone who needs a fast, low-power Linux system. Apparently, Nvidia is committed to the Jetson ecosystem, and if you ever need something faster, the ‘drop-in replacement’ promise of the TX2 awaits.
As far as what you get with the carrier board, here’s your bullet point list
- Full-size SD card, SATA connector
- USB 3.0 Type A, USB 2.0 Micro AB
- Network / Connectivity
- Gigabit Ethernet
- 802.11ac WiFi 2×2 MIMO
- Bluetooth 4.1
- PCIe x4
- DSI (2×4 lanes), eDP x4 lanes
- 6 CSI connectors
- M.2 Key E connector
- PCIE x1, SDIO, USB 2.0
- I2C, I2S, SPI, UART, D-MIC
Not much, if anything, has changed on the carrier board since the Jetson TX1. Since this is a Mini-ITX motherboard, I would have appreciated something other than barrel connector and a brick power supply. A real 20 or 24-pin ATX power connector would have been overkill, but 6 or 8-pin PCIe connectors are small enough, and there’s space somewhere on the board for one. Maybe in a few years.
Even though this is a Mini-ITX-sized board, it’s still huge for any application where the Jetson makes sense. You can’t fit this board behind the head unit in a car, and it’s too big for a drone. Since the Jetson TX1 was released, at least one company has come out with a suite of carrier boards for this module. Connecttech’s Jetsons-themed boards break out the most important bits for an embedded solution, although I have yet to see them in the wild.
On the bottom of the TX2 is a huge, confusing, and actually sourceable connector. If you want to build your own breakout board for the TX1 or TX2, all you need to do is go over to Samtec and give them the part number SEAM-50-02.0-S-08-2-A-K-TR. This part shouldn’t cost more than $5.50 in quantity one. You’ll need a four-layer board to use it, you can hand solder it. I eagerly await a Pi-top adapter for the Nvidia Jetson.
The Module & Software
Compared to the Jetson TX1, the TX2 boasts twice as much RAM with more bandwidth, twice as much eMMC Flash, and can encode 2k video twice as fast. The CPU is a dual-core Nvidia Denver 2.0 and a quad-core ARM Cortex A57.
The Jetson TX1 had an ARM Cortex A57 and an A53 quad-core sitting on the die. The A53 cores were not enabled for the Jetson. The TX2, on the other hand, is a true multi-core CPU, with a quad A57 that is reportedly good for multithreaded applications, and a dual-core Denver 2 that is meant for high performance single threaded applications.
In the last year, Nvidia released their latest line of GPUs. We should not be surprised the TX2 is built around the Pascal architecture. This is great — if you want to build a GPU cluster or play Counter Strike at eight thousand frames per second, the best bang for the buck is a Pascal-based GPU.
The Jetson TX2 has two power modes. The ‘Max Q’ setting is maximum energy efficiency, which when measuring with a meter, comes in at about 7.5 Watts. The ‘Max P’ setting is for maximum performance and comes in at around 15 Watts. In Max P mode, the performance is reportedly double that of the Jetson TX1. I was able to switch between these modes with a single command in the terminal.
A word about the gigantic heatsink on the TX2 module: When running benchmarks, the fan never turned on. The heatsink was only ever barely warm to the touch. I assume the TX2 is designed to be in the engine bay of a car, in Florida, in August.
Finally the part you’ve all been waiting for. How fast is the TX2 over the competition? It’s very fast.
The CPU on the TX2 is a dual-core Nvidia Denver 2.0 coupled with a quad-core ARM Cortex A57. As stated above the Denver is intended for fast single core performance, whereas the A57 is meant for parallel processes, but not so parallel that a GPU would be a better solution. That’s what the Pascal GPU with 256 CUDA cores is for. Compared to the TX1, memory size and bandwidth is doubled.
I used Unixbench to characterize the CPU on the TX2 and the Raspberry Pi 3 Model B. The results are below:
What’s the takeaway on this? In synthetic benchmarks testing the CPU, the Nvidia Jetson TX2 is about four times as fast as the Raspberry Pi 3. It’s fast as hell. I sincerely can’t wait for someone to 3D print a Game Cube enclosure for this thing.
Comparing the performance of the TX2 to other single board computers is a bit harder. I wouldn’t trust a self-driving car controlled by a Raspberry Pi; the performance simply isn’t there. Testing a self-driving car powered by the Jetson TX2 is also out of the question.
Giving you an idea of the performance of the TX2 when performing image-heavy tasks is actually pretty hard. Luckily, Nvidia included a few VisionWorks examples in the review package.
With VisionWorks, the Jetson was able to identify features relevant to driving across the golden gate bridge. It was able to use parallax to build a point cloud of a parking lot. The Jetson TX2 was stabilizing video in real time. A laptop could do this, but a Pi couldn’t.
But not all Deep Learning is playing with a camera; in the benchmarks released by Nvidia, the TX2 is almost twice as fast as the TX1 at GoogleNet inference performance. For AlexNet inference performance, The TX2 performs better and uses less power.
AI At The Edge
Nvidia’s marketing wank for the Jetson TX2 is, ‘Deep Learning At The Edge’. What does that mean? The future will be full of robots running OpenCV, cars avoiding people automatically, and Alexa-like voice AIs that do all their natural language processing locally. These applications are collectively referred to as Deep Learning. ‘The Edge’ in this metaphor, is environments where network latency and bandwidth are issues. For a self-driving car, there may not even be a network to send data back to a server for processing. If you don’t want your Alexa bot sending audio recordings back to a server for privacy reasons, you need to do processing locally.
The Jetson is designed to put a lot of processing power at ‘the edge’, in applications that have a power budget. This is embedded deep learning. Is a desktop CPU faster than a Jetson at deep learning tasks? Of course, but a desktop CPU is going to draw 60 Watts — the Jetson TX2 only draws fifteen. If your project or product revolves around having a laptop tucked away somewhere, you now have a replacement that’s smaller, potentially faster, and draws less power.
If you want to build a Game Cube emulator, the TX2 is not for you. If your idea of innovation is 3D printing a RetroPi enclosure, the TX2 is not for you. This is not a toy. This is an engineering tool. This is a module that will power a self-driving car, or a selfie-capturing quadcopter. These are hard engineering problems that demand fast processing with a low power budget.
There’s a reason the TX2 Developer Kit is expensive. The market for a device like this is tiny compared to the bushel of Pi Zeros at Microcenter. However, there is no other tool like this. If you need a fast CPU that only draws fifteen Watts, I’m not aware of a better option.