Our digital world is so much more interactive than the paper one it has been replacing. That becomes very obvious in the features of Jupyter Notebooks. The point is to make your data beautiful, organized, interactive, and shareable. And you can do all of this with just a bit of simple coding.
We already leveraged computer power by moving from paper spreadsheets to digital spreadsheets, but they are limited. One thing I’ve seen over and over again — and occasionally been guilty of myself — is spreadsheet abuse. That is, using a spreadsheet program to do something I probably ought to write a program to do. For those times that you want something quick but want something more than a spreadsheet, you should check out Jupyter Notebooks. The system is most commonly associated with Python, but it isn’t Python-specific. There are over 100 languages supported — many community-developed. You can even install a C++ interpreter backend for it. Because of the client/server architecture, it is very simple to share notebooks with other users.
You can — in theory — use Jupyter for anything you could use Python for. In practice, it seems to get a lot of workout with people analyzing large data sets, doing machine learning, and similar tasks.
The Good: Simple, Powerful, Extensible
The idea is simple. Think of a Markdown-enabled web page that can connect to a backend (a kernel, in Jupyter-speak). The backend can run on your machine or remotely and will support some kind of language — often Python. The document has cells that line up vertically (like a single wide spreadsheet column). For example, here’s a simple notebook I created to explain how a bunch of sine waves add up to a square wave:
You can try it live in your browser or download it from GitHub. You can see that you can get “live” graphical output, along with text and other media. In fact, I’m not taking good advantage of the formatting, but you can do anything you can do with Markdown in the text cells.
The code is pretty standard Python. For example, here’s one of the cells:
a0=amplitude*np.sin(time); plot.plot(time,a0); plot.title("Fundamental"); plot.xlabel("Time"); plot.grid(True,which="both");
Further down in the document you’ll see that you can also deploy widgets. For example, using a slider to set parameters. We’ll come back to that topic in a bit. In addition to widgets, you can get extensions that let you layout cells in a grid. These are often used to create dashboards like the one below, for example. In fact, there are lots of extensions, for lots of different purposes.
The Bad: Support for Non-Python Languages
Non-Python languages are tricky to use with Jupyter. I tried using the C++ interpreter and found it a bit hard to get going. Some of that is because C++ isn’t happy with being run incrementally — Redefining things, for example, makes it unhappy. If you want C++ or Fortran or any of the other myriad options, they may or may not work well. They may or may not be able to use libraries that a lot of Python notebooks will employ. Don’t get me wrong. I haven’t found any that don’t work at all, but sometimes it is inconvenient or difficult compared to using the Python kernel.
The other thing that strikes me as odd is that the tasks notebooks seem best for is not always what they are most used for. If you think about it, the notebooks are really an exercise in literate programming. However, it seems to me that most of the notebooks are just sent around as quick web applications. You can share a static image of a page, of course. You can also share read-only versions. GitHub, for example, will render a notebook on display. There’s also Binder which will let you share an interactive version.
Joel Bennet does what he calls literate devops — which is similar to literate programming using Jupyter and — of all things — Powershell in the video below.
Jupyter is not magic. It facilitates rapidly building little Python applications that have a very particular web interface. There are probably projects it isn’t suitable for. Not every job requires a hammer. You can save yourself some grief, though, by doing a little research on best practices before you start anything substantial. But you should constantly be asking yourself if any tool is the right tool for a given job and not just using the same thing for everything.
The Ugly: Python Package Management
I find it taxing that the system relies on Python. I don’t have much against the language itself (although my personal preference is for whitespace to not be meaningful). However, ensuring Python has everything it needs for a given notebook tends to be super painful. If you plan on distributing, this becomes another layer of issues in ensuring everyone has the right packages. On Binder, you can provide a requirements.txt file that tells it what things you need to import, so that’s workable but an extra step.
The bulletproof way to install the program locally is with Anaconda which — of course — creates a totally different Python environment than your normal Python environment. Yes, I know about virtualenv. And pip. Of course, my Linux system has a package manager, too, and it has versions of Jupyter and all the Python libraries. But everyone wants their own package manager to rule my system and I have no idea what to do about your system.
Once you get it installed, it is fine. And if you get it working on Binder, you should be good since it builds each user a new Docker container. However, if you really plan on distributing complex notebooks, the installation across multiple platforms and Python versions could pose a risk.
Widgets Make It Interactive
If you look at the last two code cells of my example document (from above), you’ll see that I use a slider widget to let you interactively adjust the equations and the graph. That’s just one of the various widgets available.
If you aren’t picky, the system will build widgets for a function for you. You don’t always get the control of things like ranges and steps, but for many functions, you can get a reasonable UI by just making a simple call to interact, or including it with the function:
@interact(x=True, y=1.0) def g(x, y): return (x, y)
That will produce a checkbox for x and a slider for y. You’ll get default values, but in many cases, that’ll be acceptable.
I’ve been talking about traditional notebooks, but the next generation interface rolled out last year. Known as JupyterLab, it allows you to use other tools like editors in a tabbed-interface. Binder supports the new interface if you want to give it a quick spin.
You can continue working with traditional notebooks using the new interface, so we expect to see increased adoption of JupyterLab over time.
Should you use Jupyter? That’s like asking if you should use a saw. If you are cutting wood, yes! If you are trying to join two pieces of plastic, no. Jupyter definitely fits a niche — and a niche that many of us writing math- and data-intensive software work within. The fact that you can distribute it easily and even interface with hardware makes it attractive for projects where you want something quick but powerful.
Although some of the languages other than Python are second-class citizens, there are many choices and you can work around any limitations. So even if you aren’t a Python guru, you’ll still want to add this power notebook system to your toolbox.