Practical Deep Learning

Deep Learning — the use of neural networks with modern techniques to tackle problems ranging from computer vision to speech recognition and synthesis — is certainly a current buzzword. However, at the core is a set of powerful methods for organizing self-learning systems. Multi-layer neural networks aren’t new, but there is a resurgence of interest primarily due to the availability of massively parallel computation platforms disguised as video cards.

The problem is getting started in something like this. There are plenty of scholarly papers that can be hard to wade through. Or you can grab some code from GitHub and try to puzzle it out.

A better idea would be to take a free class entitled: Practical Deep Learning for Coders, Part 1. The course is free unless you count your investment in time. They warn you to expect to commit about ten hours a week for seven weeks to complete the course. You can see the first installment in the video, below.

The course originated at the University of San Francisco. Here’s their description:

This 7-week course is designed for anyone with at least a year of coding experience, and some memory of high-school math. You will start with step one—learning how to get a GPU server online suitable for deep learning—and go all the way through to creating state of the art, highly practical, models for computer vision, natural language processing, and recommendation systems.

Lesson 1 covers distinguishing cats from dogs. There’s a Slack channel for chat, a forum, and other support resources. You might be concerned about where part 2 is. According to the site, it should be available online in May of 2017.

We recently talked about simple neural networks. We’ve also looked at some speech applications of DeepMind.

19 thoughts on “Practical Deep Learning

    1. I’m about halfway through the 1st lesson, and it is just bloody awful!

      Poor presentation skill, little preparation, very little overview… so far it’s just a bunch of trivia how-to things about keyboard shortcuts, the internals of shell scripts they’ve written, and a bunch of random, disorganized noise.

      To take an example, he wants to show people the Jupyter notebook but it’s not running, so he goes to the AWS dashboard which shows a non-running instance (and which he could click to start), but says how he doesn’t like the dashboard so he goes to command line and shows how to use wget to get the shell scripts, then shows how the shell scripts look internally and how they’re laid out (long, complicated commands glued together with no comments), then shows how to start the AWS instance from the command line, shows how to cut/paste the IP address into the browser, and then shows how to use a shell window thingy (by this time I wasn’t paying much attention) and how to make different windows, then how to start the Jupyter notebook, and only then does he start to show people the Jupyter notebook WHICH IS WHAT HE WAS TRYING TO DO IN THE FIRST PLACE!

      The whole lesson seems to be like this. It’s like he’s trying to build a house from all the lumber piled in a heap.

      I can’t say I recommend this series. Maybe some of the other lessons are better, I don’t know.

      Oh, and to answer Ostracus’ question above, go to Kaggle to get good data sets. They keep archives of the datasets for all their competitions.

        1. It may not be exclusively ‘deep’ learning, but I find the scikit-learn documentation quite well written with good examples on how and what to do and for what reason. It’s easy to load a bunch of very common datasets such as the MNIST digits and really helped me to get into applying machine learning methods.
          http://scikit-learn.org/

          And for additional datasets on a variety of topics:
          http://archive.ics.uci.edu/ml/datasets.html

          Then for actual deep learning, lasagne may be a point to start. – It’s a python wrapper for theano which basically allows you to build neural nets by yourself. There is certainly a lot of theory involved in deep learning but often it’s also just trial and error and seeing which model performs best.

          For Lasagne: http://lasagne.readthedocs.io/en/latest/

      1. The first 90 minutes of the 18 hours of lessons covers learning to set up and run your GPU AWS instance and deep learning libraries. If you’re already familiar with this you may find you can skip over it – the next 16.5 hours covers building and training models.

        It was important to us that we assumed as little background as possible to take this course – so whilst you may think it silly that we showed how to start an instance, for a lot of people that’s important info. Doing it through the terminal rather than the web-based GUI is an approach we recommend, but it’s not totally necessary so feel free to use whatever approach you feel most comfortable with.

      1. +1.
        DID watch the video – it’s essentially the website in video form. So thank you both. This is the most succinct intro to neural nets I’ve seen. After [PWalsh]’s comment, I sure didn’t want to dive into THAT course, so thank you both for giving me an alternative.

    1. Just browsed a couple chapters, but this looks also very good. As you suggest, it’s a better college textbook than it is a friendly introduction.

      Still, this and the above lack the _practical_ (getting these things solving for weights on (e.g.) a GPU) that the original post should have tackled.

  1. The fundamental problem with neural networks is that you don’t know what is actually being produced. NATO trained a neural network to recognize armored vehicles. Or so they thought. What the researchers failed to note was that all the training photos of armor were taken on cloudy days. So they had trained the neural net to recognize photos taken on cloudy days.

    Deep learning is pretty much a reprise of neural nets using L1 in place of L2.

  2. For a bounded time-variant linear or discrete-time “near real-time” sampled system, I argue (e.g.) Bayesian Inference followed by feed-forward parameters to (e.g.) a Kalman filter is a better approach given the computing devices available to Humans at my post date (no Quantum Computers). With proper tuning of coefficient limits, a bounded system using this approach will be unconditionally stable, a necessary constraint.

    Reentrant (e.g. Neural) networks do have their place however – especially when the input dataset(s) are unbounded. But this approach is slow in comparison. Therefore reentrant networks are more applicable to analyzing existing non-real-time datasets.

    Obviously – a mix of the two approaches described above – when needed, will result in a better result for certain applications, but great care must be taken in discrete-sampled systems to avoid the introduction of quantization errors that may result in instability.

    1. I’d argue too, but I know nothing about this. Sounds interesting. Where would I best learn about ‘discrete-time “near real-time” sampled system’, which sounds like it would be what I’d be after for going beyond numerical analysis of a stock live on the stock market. Preferably something that could be constructed simply and run, then expanded from there as a model (models?) grows.
      Thanks!

  3. I’m taking the courses over at courses.fast.ai, and I’m so glad I watched the video before I listened to PWalsh, this is world class material offered for free. Come over and join us if you’re interested in deeplearning, it’s worth.
    -User

Leave a Reply to jphowardCancel reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.