[Larry] has done this sort of thing before with Amazon’s EC2, but recently Microsoft has been offering a beta access to some of NVIDIA’s Tesla M60 graphics cards. As long as you have a fairly beefy connection that can support 30 Mbps of streaming data, you can play just about any imaginable game at 60fps on the ultimate settings.
It takes a bit of configuration magic and quite a few different utilities to get it all going, but in the end [Larry] is able to play Overwatch on max settings at a nice 60fps for $1.56 an hour. Considering that just buying the graphics card alone will set you back 2500 hours of play time, for the casual gamer, this is a great deal.
It’s interesting to see computers start to become a rentable resource. People have been attempting streaming computers for a while now, but this one is seriously impressive. With such a powerful graphics card you could use this for anything intensive, need a super high-powered video editing station for a day or two? A CAD station to make anyone jealous? Just pay a few dollars of cloud time and get to it!
The pedagogical model of the integrated circuit goes something like this: take a silicone wafer, etch out a few wells, dope some of the silicon with phosphorous, mask some of the chip off, dope some more silicon with boron, and lay down some metal in between everything. That’s an extraordinarily basic model of how the modern semiconductor plant works, but it’s not terribly inaccurate. The conclusion anyone would make after learning this is that chips are inherently three-dimensional devices. But the layers are exceedingly small, and the overall thickness of the active layers of a chip are thinner than a human hair. A bit of study and thought and you’ll realize the structure of an integrated circuit really isn’t in three dimensions.
Recently, rumors and educated guesses coming from silicon insiders have pointed towards true three-dimensional chips as the future of the industry. These chips aren’t a few layers thick like the example above. Instead of just a few dozen layers, 100 or more layers of transistors will be crammed into a single piece of silicon. The reasons for this transition range from shortening the distance signals must travel, reducing resistance (and therefore heat), and optimizing performance and power in a single design.
The ideas that are influencing the current generation of three-dimensional chips aren’t new; these concepts have been around since the beginnings of the semiconductor industry. What is new is how these devices will eventually make it to market, the challenges currently being faced at Intel and other semiconductor companies, and what it will mean for a generation of chips several years down the road.
Last week, the Nvidia Jetson TX1 was released. This credit card-sized module is a ‘supercomputer’ advertised as having more processing power than the latest Intel Core i7s, while running at under 10 Watts. This is supposedly the device that will power the next generation of things, using technologies unheard of in the embedded world.
A modern day smartphone could have been built 10 or 15 years ago. There’s no question the processing power was there with laptop CPUs, and the tiny mechanical hard drives in the original iPod was more than spacious enough to hold a library of Napster’d MP3s and all your phone contacts. The battery for this sesquidecadal smartphone, on the other hand, was impossible. The future depends on batteries and consequently low power computing. Is the Jetson TX1 the board that will deliver us into the future? It took a hands-on look to find out.
What is the TX1
The Jetson TX1 is a tiny module – 50x87mm – encased in a heat sink that brings the volume to about the same size as a pack of cigarettes. Underneath a block of aluminum is an Nvidia Tegra X1, a module that combines a 64-bit quad-core ARM Cortex-A57 CPU with a 256-core Maxwell GPU. The module is equipped with 4GB of LPDDR4-3200, 16GB of eMMC Flash, 802.11ac WiFi, and Bluetooth.
This module connects to the outside world through a 400-pin connector (from Samtec, a company quite liberal with product samples, by the way) that provides six CSI outputs for a half-dozen Raspberry Pi-style cameras, two DSI outputs, 1 eDP 1.4, 1 eDP 1.2, and HDMI 2.0 for displays. Storage is provided through either SD cards or SATA. Other ports include three USB 3.0, three USB 2.0, Gigabit Ethernet, a PCIe x1 and PCIe x4, and a host of GPIOs, UARTs, SPI and I2C busses.
The only way of getting at all these extra ports is, at the moment, the Jetson TX1 carrier board, a board that is effectively a MiniITX motherboard. Mount this carrier board in a case, modify a power supply and figure out how to wire up the front panel buttons, and you’ll have a respectable desktop computer.
This is not a desktop computer, though, and it’s not a replacement for a Raspberry Pi or Beaglebone. This is an engineering tool – a device built to handle the advanced robotics work of the future.
No tech review would be complete without benchmarks, and since this is an Nvidia board, that means a deep dive into the graphics performance.
The review unit Nvidia sent over came with an incredible amount of documentation, pointing me towards GFXBench 4.0 Manhattan 3.1 (and the T-rex one) to test the graphics performance.
In terms of graphics performance, the TX1 isn’t that much different from a run-of-the-mill mobile chipset from a few years ago. This is to be expected; it’s unreasonable to expect Nvidia to put a Titan in a 10 Watt module; the Titan itself sucks up about 250 Watts.
What about CPU performance? The ARM Cortex A57 isn’t seen very much in tiny credit-card sized dev boards, but there are a few actual products out there with it. The TX1 isn’t a powerhouse by any means, but it does trounce the Raspberry Pi 2 Model B in testing by a factor of about three.
Compared to desktop/x86 performance, the best benchmarks again put the Nvidia TX1 in the same territory as a middling desktop from a few years ago. Still, that desktop probably draws about 300 W total, where the TX1 sips a meager 10 W.
This is not the board you want if you’re mining Bitcoins, and it’s not the board you should use if you need a powerful, portable device that can connect to anything. It’s for custom designs. The Nvidia TX1 is a module that’s meant to be integrated into products. It’s not a board for ‘makers’ and it’s not designed to be. It’s a board for engineers that need enough power in a reasonably small package that doesn’t drain batteries.
With an ARM Cortex A57 quad core running at almost 2 GHz, 4 GB of RAM, and a reasonably powerful graphics card for the power budget, the Nvidia TX1 is far beyond the usual tiny Linux boards. It’s far beyond the Raspi, the newest Beagleboard, and gives the Intel NUC boards a run for their money.
In terms of absolute power, the TX1 is about as powerful as a entry-level laptop from three or four years ago.
The Jetson TX1 is all about performance per Watt. That’s exceptional, new, and exciting; it’s something that simply hasn’t been done before. If you believe the reams of technical documents Nvidia granted me access to, it’s the first step to a world of truly smart embedded devices that have a grasp on computer vision, machine learning, and a bunch of other stuff that hasn’t really found its way into the embedded world yet.
And here lies the problem with the Jetson TX1; because a platform like this hasn’t been available before, the development stack, examples, and community of users simply isn’t there yet. The number of people contributing to the Nvidia embedded systems forum is tiny – our Hackaday articles get more comments than a thread on the Nvidia forums. Like all new platforms, the only thing missing is the community, putting Nvidia in a chicken and egg scenario.
This a platform for engineers. Specifically, engineers who are building autonomous golf carts and cars, quadcopters that follow you around, and robots that could pass a Turing test for at least 30 seconds. It’s an incredible piece of hardware, but not one designed to be a computer that sits next to a TV. The TX1 is an engineering tool that’s meant to go into other devices.
Alternative Applications, Like Gamecube
With that said, there are a few very interesting applications I could see the TX1 being used for. My car needs a new head unit, and building one with the TX1 would future proof it for at least another 200,000 miles. For the very highly skilled amateur engineers, the TX1 module opens a lot of doors. Six webcams is something a lot of artists would probably like to experiment with, and two DSI outputs – and a graphics card – would allow for some very interesting user interfaces.
That said, the TX1 carrier board is not the breakout board for these applications. I’d like to see something like what Sparkfun put together for the Intel Edison – dozens of breakout boards for every imaginable use case. The PCB files for the TX1 carrier board are available through the Nvidia developer’s portal (hope you like OrCAD), and Samtec, the supplier for the 400-pin connector used for the module, is exceedingly easy to work with. It’s not unreasonable for someone with a reflow toaster oven to create a breakout for the TX1 that’s far more convenient than a Mini-ITX motherboard.
Right now there aren’t many computers with ARM processors and this amount of horsepower out now. Impressively powerful ARM boards, such as the new BeagleBoard X15 and those that follow the 96Boards specification exist, but these do not have a modern graphics card baked into the module.
Without someone out there doing the grunt work of making applications with mass appeal work with the TX1, it’s impossible to say how well this board performs at emulating a GameCube, or any other general purpose application. The hardware is probably there, but the reviewers for the TX1 have been given less than a week to StackOverflow their way through a compatible build for the most demanding applications this board wasn’t designed for.
It’s all about efficiency
Is the TX1 a ‘supercomputer on a module’? Yes, and no. While it does perform reasonably well at machine learning tasks compared to the latest core-i7 CPUs, the Alexnet machine learning tasks are a task best suited for GPUs. It’s like asking which flies better: a Cessna 172 or a Bugatti Veyron? The Cessna is by far the better flying machine, but if you’re looking for a ‘supercomputer’, you might want to look at a 747 or C-5 Galaxy.
On the other hand, there aren’t many boards or modules out there at the intersection of high-powered ARM boards with a GPU and on a 10 Watt power budget. It’s something that’s needed to build the machines, robots, and autonomous devices of the future. But even then it’s still a niche product.
I can’t wait to see a community pop up around the TX1. With a few phone calls to Samtec, a few hours in KiCad, and a group buy for the module itself ($299 USD in 1000 unit quantities), this could be the start of something very, very interesting.
Today, Nvidia announced their latest platform for advanced technology in autonomous machines. They’re calling it the Jetson TX1, and it puts modern GPU hardware in a small and power efficient module. Why would anyone want GPUs in an embedded format? It’s not about frames per second; instead, Nvidia is focusing on high performance computing tasks – specifically computer vision and classification – in a platform that uses under 10 Watts.
For the last several years, tiny credit card sized ARM computers have flooded the market. While these Raspberry Pis, BeagleBones, and router-based dev boards are great for running Linux, they’re not exactly very powerful. x86 boards also exist, but again, these are lowly Atoms and other Intel embedded processors. These aren’t the boards you want for computationally heavy tasks. There simply aren’t many options out there for high performance computing on low-power hardware.
Tiny ARM computers the size of a credit card have served us all well for general computing tasks, and this leads to the obvious question – what is the purpose of putting so much horsepower on such a small board. The answer, at least according to Nvidia, is drones, autonomous vehicles, and image classification.
Image classification is one of the most computationally intense tasks out there, but for autonomous robots, there’s no other way to tell the difference between a cyclist and a mailbox. To do this on an embedded platform, you either need to bring a powerful general purpose CPU that sucks down 60 or so Watts, or build a smaller, more efficient GPU-based solution that sips a meager 10 Watts.
If hardware manufacturers want to keep their firmware crippling a secret, perhaps they shouldn’t mess with Linux users? We figure if you’re using Linux you’re quite a bit more likely than the average Windows user to crack something open and see what’s hidden inside. And so we get to the story of how [Gnif] figured out that the NVIDIA GTX690 can be hacked to perform like the Quadro K5000. The thing is, the latter costs nearly $800 more than the former!
[Gnif] wanted the card for gaming and to support multiple monitors. It has no problem driving up to three screens under Windows. But the Linux drivers only allow this on the professional counterpart to the GTX690, the Quadro K5000. It turns out that the card responds to a device ID as assigned by a series of analog values. These can be tweaked by swapping, yanking, or adding resistors in just the right places. As with that Agilent multimeter unlock of his which we saw a few days ago, he somehow managed to figure out the secret sauce that unlocks the power hidden in this card.
Powerful graphics cards are pretty affordable these days. Even though we rarely do high-end gaming on our daily machine we still have a GeForce 9800 GT. That goes to waste on a machine used mainly to publish posts and write code for microcontrollers. But perhaps we can put the GPU to good use when it comes compile time. The KGPU package enlists your graphics card to help the kernel do some heavy lifting.
This won’t work for just any GPU. The technique uses CUDA, which is a parallel computing package for NVIDIA hardware. But don’t let lack of hardware keep you from checking it out. [Weibin Sun] is one of the researchers behind the technique. He posted a whitepaper (PDF) on the topic over at his website.
If you ever need to manipulate images really fast, or just want to make some pretty fractals, [Reuben] has just what you need. He developed a neat command line tool to send code to a graphics card and generate images using pixel shaders. Opposed to making these images with a CPU, a GPU processes every pixel in parallel, making image processing much faster.
All the GPU coding is done by writing a bit of code in GLSL. [Reuben]’s command line utility takes that code, sends it to the graphics card, and returns the image calculated by the GPU. It’s very simple for to make pretty Mandebrolt set images and sine wave interference this way, but [Reuben]’s project can do much more than that. By sending an image to the GPU and performing a few operations, [Reuben] can do very fast edge detection and other algorithmic processing on pre-existing images.
So far, [Reuben] has tested his software with a few NVIDIA graphics cards under Windows and Linux, although it should work with any graphics card with pixel shaders.
Although [Reuben] is sending code to his GPU, it’s not quite on the level of the NVIDIA CUDA parallel computing platform; [Reuben] is only working with images. Cleverly written software could get around that, though. Still, even if [Reuben]’s project is only used for image processing, it’s still much faster than any CPU-bound method.