Neural networks are all the rage right now with increasing numbers of hackers, students, researchers, and businesses getting involved. The last resurgence was in the 80s and 90s, when there was little or no World Wide Web and few neural network tools. The current resurgence started around 2006. From a hacker’s perspective, what tools and other resources were available back then, what’s available now, and what should we expect for the future? For myself, a GPU on the Raspberry Pi would be nice.
The 80s and 90s
For the young’uns reading this who wonder how us old geezers managed to do anything before the World Wide Web, hardcopy magazines played a big part in making us aware of new things. And so it was Scientific American magazine’s September 1992 special issue on Mind and Brain that introduced me to neural networks, both the biological and artificial kinds.
Back then you had the option of writing your own neural networks from scratch or ordering source code from someone else, which you’d receive on a floppy diskette in the mail. I even ordered a floppy from The Amateur Scientist column of that Scientific American issue. You could also buy a neural network library that would do all the low-level, complex math for you. There was also a free simulator called Xerion from the University of Toronto.
Keeping an eye on the bookstore Science sections did turn up the occasional book on the subject. The classic was the two-volume Explorations in Parallel Distributed Processing, by Rumelhart, McClelland et al. A favorite of mine was Neural Computation and Self-Organizing Maps: An Introduction, useful if you were interested in neural networks controlling a robot arm.
There were also short courses and conferences you could attend. The conference I attended in 1994 was a free two-day one put on by Geoffrey Hinton, then of the University of Toronto, both then and now a leader in the field. The best reputed annual conference at the time was the Neural Information Processing System conference, still going strong today.
And lastly, I recall combing the libraries for published papers. My stack of conference papers and course handouts, photocopied articles, and handwritten notes from that period is around 3″ thick.
Then things went relatively quiet. While neural networks had found use in a few applications, they hadn’t lived up to their hype and from the perspective of the world, outside of a limited research community, they ceased to matter. Things remained quiet as gradual improvements were made, along with a few breakthroughs, and then finally around 2006 they exploded on the world again.
The Present Arrives
We’re focusing on tools here but briefly, those breakthroughs were mainly:
- new techniques for training networks that go more than three or four layers deep, now called deep neural networks
- the use of GPUs (Graphics Processing Units) to speed up training
- the availability of training data containing large numbers of samples
Neural Network Frameworks
There are now numerous neural network libraries, usually called frameworks, available for download for free with various licenses, many of them open source frameworks. Most of the more popular ones allow you to run your neural networks on GPUs, and are flexible enough to support most types of networks.
Here are most of the more popular ones. They all have GPU support except for FNN.
Languages: Python, C++ is in the works
TensorFlow is Google’s latest neural network framework. It’s designed for distributing networks across multiple machines and GPUs. It can be considered a low-level one, offering great flexibility but also a larger learning curve than high-level ones like Keras and TFLearn, both talked about below. However, they are working on producing a version of Keras integrated in TensorFlow.
This is an open source library for doing efficient numerical computations involving multi-dimensional arrays. It’s from the University of Montreal, and runs on Windows, Linux and OS-X. Theano has been around for a long time, 0.1 having been released in 2009.
Languages: Command line, Python, and MATLAB
Caffe is developed by Berkeley AI Research and community contributors. Models can be defined in a plain text file and then processed using a command line tool. There are also Python and MATLAB interfaces. For example, you can define your model in a plain text file, give details on how to train it in a second plain text file called a solver, and then pass these to the caffe command line tool which will then train a neural network. You can then load this trained net using a Python program and use it to do something, image classification for example.
Languages: Python, C++, C#
This is the Microsoft Cognitive Toolkit (CNTK) and runs on Windows and Linux. They’re currently working on a version to be used with Keras.
Written in Python, Keras uses either TensorFlow or Theano underneath, making it easier to use those frameworks. There are also plans to support CNTK as well. Work is underway to integrate Keras into TensorFlow resulting in a separate TensorFlow-only version of Keras.
Like Keras, this is a high-level library built on top of TensorFlow.
Languages: Supports over 15 languages, no GPU support
This is a high-level open source library written in C. It’s limited to fully connected and sparsely connected neural networks. However, it’s been popular over the years, and has even been included in Linux distributions. It’s recently shown up here on Hackaday in a robot that learned to walk using reinforcement learning, a machine learning technique that often makes use of neural networks.
Open source library written in C. Interestingly, they say on the front page of their website that Torch is embeddable, with ports to iOS, Andoid and FPGA backends.
PyTorch is relatively new, their website says it’s in early-release beta, but there seems to be a lot interest in it. It runs on Linux and OS-X and uses Torch underneath.
There are no doubt others that I’ve missed. If you have a particular favorite that’s not here then please let us know in the comments.
Which one should you use? Unless the programming language or OS is an issue then another factor to keep in mind is your skill level. If you’re uncomfortable with math or don’t want to dig deeply into the neural network’s nuances then chose a high-level one. In that case, stay away from TensorFlow, where you have to learn more about the API than Kera, TFLearn or the other high-level ones. Frameworks that emphasize their math functionality usually require you to do more work to create the network. Another factor is whether or not you’ll be doing basic research. A high-level framework may not allow you to access the innards enough to start making crazy networks, perhaps with connections spanning multiple layers or within layers, and with data flowing in all directions.
Are you you’re looking to add something a neural network would offer to your hack but don’t want to take the time to learn the intricacies of neural networks? For that there are services available by connecting your hack to the internet.
We’ve seen countless examples making use of Amazon’s Alexa for voice recognition. Google also has its Cloud Machine Learning Services which includes vision and speech. Its vision service have shown up here using Raspberry Pi’s for candy sorting and reading human emotions. The Wekinator is aimed at artists and musicians that we’ve seen used to train a neural network to respond to various gestures for turning things on an off around the house, as well as for making a virtual world’s tiniest violin. Not to be left out, Microsoft also has its Cognitive Services APIs, including: vision, speech, language and others.
GPUs and TPUs
Training a neural network requires iterating through the neural network, forward and then backward, each time improving the network’s accuracy. Up to a point, the more iterations you can do, the better the final accuracy will be when you stop. The number of iterations could be in the hundreds or even thousands. With 1980s and 1990s computers, achieving enough iterations could take an unacceptable amount of time. According to the article, Deep Learning in Neural Networks: An Overview, in 2004 an increase of 20 times the speed was achieved with a GPU for a fully connected neural network. In 2006 a 4 times increase was achieved for a convolutional neural network. By 2010, increases were as much as 50 times faster when comparing training on a CPU versus a GPU. As a result, accuracies were much higher.
How do GPUs help? A big part of training a neural network involves doing matrix multiplication, something which is done much faster on a GPU than on a CPU. Nvidia, a leader in making graphics cards and GPUs, created an API called CUDA which is used by neural network software to make use of the GPU. We point this out because you’ll see the term CUDA a lot. With the spread of deep learning, Nvidia has added more APIs, including CuDNN (CUDA for Deep Neural Networks), a library of finely tuned neural network primitives, and another term you’ll see.
Nvidia also has its own single board computer, the Jetson TX2, designed to be the brains for self-driving cars, selfie-snapping drones, and so on. However, as our [Brian Benchoff] has pointed out, the price point is a little high for the average hacker.
Google has also been working on its own hardware acceleration in the form of its Tensor Processing Unit (TPU). You might have noticed the similarity to the name of Google’s framework above, TensorFlow. TensorFlow makes heavy use of tensors (think of single and multi-dimensional arrays in software). According to Google’s paper on the TPU it’s designed for the inference phase of neural networks. Inference refers not to training neural networks but to using the neural network after it’s been trained. We haven’t seen it used by any frameworks yet, but it’s something to keep in mind.
Using Other People’s Hardware
Do you have a neural network that’ll take a long time to train but don’t have a supported GPU, or don’t want to tie up your resources? In that case there’s hardware you can use on other machines accessible over the internet. One such is FloydHub which, for an individual, costs only penny’s per hour with no monthly payment. Another is Amazon EC2.
We said that one of the breakthroughs in neural networks was the availability of training data containing large numbers of samples, in the tens of thousands. Training a neural network using a supervised training algorithm involves giving the data to the network at its inputs but also telling it what the expected output should be. In that case the data also has to be labeled. If you give an image of a horse to the network’s inputs, and its outputs say it looks like a cheetah, then it needs to know that the error is large and more training is needed. The expected output is called a label, and the data is ‘labeled data’.
Many such datasets are available online for training purposes. MNIST is one such for handwritten character recognition. ImageNet and CIFAR are two different datasets of labeled images. Many more are listed on this Wikipedia page. Many of the frameworks listed above have tutorials that include the necessary datasets.
That’s not to say you absolutely need a large dataset to get a respectable accuracy. The walking robot we previously mentioned that used the FNN framework, used the servo motor positions as its training data.
Unlike in the 80s and 90s, while you can still buy hardcopy books about neural networks, there are numerous ones online. Two online books I’ve enjoyed are Deep Learning by the MIT Press and Neural Networks and Deep Learning. The above listed frameworks all have tutorials to help get started. And then there are countless other websites and YouTube videos on any topic you search for. I find YouTube videos of recorded lectures and conference talks very useful.
Doubtless the future will see more frameworks coming along.
We’ve long seen specialized neural chips and boards on the market but none have ever found a big market, even back in the 90s. However, those aren’t designed specially for serving the real growth area, the neural network software that everyone’s working on. GPUs do serve that market. As neural networks with millions of connections for image and voice processing, language, and so on make their way into smaller and smaller consumer devices the need for more GPUs or processors tailored to that software will hopefully result in something that can become a new component on a Raspberry Pi or Arduino board. Though there is the possibility that processing will remain an online service instead. EDIT: It turns out there is a GPU on the Raspberry Pi — see the comments below. That doesn’t mean all the above frameworks will make use of it though. For example, TensorFlow supports Nvidia CUDA cards only. But you can still use the GPU for your own custom neural network code. Various links are in the comments for that too.
There is already competition for GPUs from ASICs like the TPU and it’s possible we’ll see more of those, possibly ousting GPUs from neural networks altogether.
As for our new computer overlord, neural networks as a part of our daily life are here to stay this time, but the hype that is artificial general intelligence will likely quieten until someone makes significant breaktroughs only to explode onto the scene once again, but for real this time.
In the meantime, which neural network framework have you used and why? Or did you write your own? Are there any tools missing that you’d like to see? Let us know in the comments below.