Software: It Is All In The Details

Who’s the better programmer? The guy that knows 10 different languages, or someone who knows just one? It depends. Programming is akin to math, or perhaps it is that we treat some topics differently than others which leads to misconceptions about what makes a good programmer, mathematician, or engineer. We submit that to be a great programmer is less about the languages you know and more about the algorithms and data structures you understand. If you know how to solve the problem, mapping it to a particular computer language should be almost an afterthought. While there are many places that you can learn those things, there is a lot more focus on how to write the languages,  C++ or Java or Python or whatever. We were excited, then, to see [Jeff Erickson] is publishing his algorithms book distilled from teaching at the University of Illinois, Urbana-Champaign for a number of years. The best part? You can read the preprint version online now and it will remain online even after the book goes to print.

When you were in school, you probably learned math in two ways: there was the mechanics (4×4=16) and then there were the word problems (Johnny has 10 candy bars and eats 4, how many are left?). Word problems are usually the bane of the student’s existence, yet they are much more realistic. Your boss has (probably) never come in your office and asked you what 147 divided by 12 is. If she did, you could hand her a calculator. The real value comes in being able to synthesize the right math for the right problem and — if you are lucky — gaining intuition about it (doubling the price will only increase profit by 10%). Software is pretty much the same, for example no one rushes into your cubicle and says “Quick! We need a for loop written!” You get a hazy set of requirements if you are lucky, and you then need to map that into something that computers can do. For that reason, we’ve always been more of a fan of learning about algorithms and data structures rather than specific language features.

Continue reading “Software: It Is All In The Details”

Thread Carefully: An Introduction To Concurrent Python

The ability to execute code in parallel is crucial in a wide variety of scenarios. Concurrent programming is a key asset for web servers, producer/consumer models, batch number-crunching and pretty much any time an application is bottlenecked by a resource.

It’s sadly the case that writing quality concurrent code can be a real headache, but this article aims to demonstrate how easy it is to get started writing threaded programs in Python. Due to the large number of modules available in the standard library which are there to help out with this kind of thing, it’s often the case that simple concurrent tasks are surprisingly quick to implement.

We’ll walk through the difference between threads and processes in a Python context, before reviewing some of the different approaches you can take and what they’re best suited for.

Continue reading “Thread Carefully: An Introduction To Concurrent Python”

Does Library Bloat Make Your Smartphone App Look Fat?

While earlier smartphones seemed to manage well enough with individual applications that only weighed in at a few megabytes, a perusal of the modern smartphone software store uncovers some positively monstrous file sizes. The fact that we’ve become accustomed to mobile applications requiring 100+ MB downloads on what’s often a metered Internet connection in only a few short years is pretty crazy if you stop to think about it.

Seeing reports that the Nest app for iOS tipped the scales at nearly 250 MB, [Alexandre Colucci] decided to investigate. On his blog he not only documents the process of taking the application apart piece by piece to find out just what’s eating up all that space, but lists some potential fixes which could shave a bit off the top. Even if you aren’t planning a spelunking expedition into your pocket supercomputer’s particular variant of the Netflix app, the methodology and tools he uses here are fascinating in their own right and might be something worth adding to your software bag of tricks.

By passing the application’s files through a disk usage visualizer called GrandPerspective, [Alexandre] immediately identified some rather large blocks of content. The bundled Apple Watch version of the app takes up 23 MB, video and audio used to walk the user through the device setup weigh in at 22 MB, and localization files for various languages consumes a surprising 33 MB. But the biggest single contributor to the application’s heft is the assorted libraries and frameworks which total up to an incredible 67 MB.

Of course the question is, how much of it is really necessary? It’s hard to be sure from an outsider’s perspective, but [Alexandre] notes that a few of the libraries used seem to be redundant or obsolete. In some cases this could be the result of old code still lurking in the project, but the four different libraries used for user tracking probably aren’t in there by accident. It also stands to reason that the instructional videos could be offloaded to something like YouTube, so that only users who need to view them have to expend their bandwidth on it.

Getting a little deeper into things, [Alexandre] notes that some of the localization images appear to be redundant. As a specific example, he points to the images of the Nest itself displaying Fahrenheit and Celsius temperatures. While logically this should only be two image files, there are actually eight copies of the Celsius image, each filed away as language-specific. These redundant localization images could easily be stripped out, but with gains measured in only a few hundred kilobytes, it probably wasn’t considered worth the effort during development.

In the end there’s really not as much bloat as we might like to believe. There were some redundant files, maybe a few questionable library inclusions, and the Apple Watch version of the app could surely be separated out. All together, it might get you a savings of 30 – 40%, but still not enough to bring it down under 100 MB.

All signs point to the fact that modern smartphone software development is just a lot more burdensome than us hackers might like. Save for projects looking to put control back into the hand’s of the users, it looks like mobile operating systems aren’t going to be slimming down anytime soon.

A Cheat Sheet For Publishing Python Packages

[Brendan Herger] was warned that the process of publishing a Python package would be challenging. He relishes a challenge, however, and so he went at it with gusto. The exhausting process led him to share a cheat sheet for publishing Python packages with the goal of making the next time smoother, while also letting other people benefit from his experience and get a running start.

[Brendan] describes publishing a Python package as “tying together many different solutions with brittle interchanges.” His cheat sheet takes the form of an ordered workflow for getting everything in place, with some important decisions and suggestions about things like formatting and continuous integration (CI) made up-front.

The guide is brief, but [Brendan] has made errors and hit dead ends in the hopes that others won’t have to. The whole thing came about from his work in deep learning, and his desire to create a package that allows rapid building and iterating on deep learning models.

Deep learning is a type of machine learning that involves finding representations in large amounts of data. [Brendan] used it in a project to automatically decide whether a Reddit post contains Star Wars plot spoilers, and we recently saw it featured in a method of capturing video footage only if a hummingbird is present.

Warnings On Steroids – Static Code Analysis Tools

A little while back, we were talking about utilizing compiler warnings as first step to make our C code less error-prone and increase its general stability and quality. We know now that the C compiler itself can help us here, but we also saw that there’s a limit to it. While it warns us about the most obvious mistakes and suspicious code constructs, it will leave us hanging when things get a bit more complex.

But once again, that doesn’t mean compiler warnings are useless, we simply need to see them for what they are: a first step. So today we are going to take the next step, and have a look at some other common static code analysis tools that can give us more insight about our code.

You may think that voluntarily choosing C as primary language in this day and age might seem nostalgic or anachronistic, but preach and oxidate all you want: C won’t be going anywhere. So let’s make use of the tools we have available that help us write better code, and to defy the pitfalls C is infamous for. And the general concept of static code analysis is universal. After all, many times a bug or other issue isn’t necessarily caused by the language, but rather some general flaw in the code’s logic.

Continue reading “Warnings On Steroids – Static Code Analysis Tools”

Creating Black Holes: Division By Zero In Practice

Dividing by zero — the fundamental no-can-do of arithmetic. It is somewhat surrounded by mystery, and is a constant source for internet humor, whether it involves exploding microcontrollers, the collapse of the universe, or crashing your own world by having Siri tell you that you have no friends.

It’s also one of the few things gcc will warn you about by default, which caused a rather vivid discussion with interesting insights when I recently wrote about compiler warnings. And if you’re running a modern operating system, it might even send you a signal that something’s gone wrong and let you handle it in your code. Dividing by zero is more than theoretical, and serves as a great introduction to signals, so let’s have a closer look at it.

Chances are, the first time you heard about division itself back in elementary school, it was taught that dividing by zero is strictly forbidden — and obviously you didn’t want your teacher call the cops on you, so you obeyed and refrained from it. But as with many other things in life, the older you get, the less restrictive they become, and dividing by zero eventually turned from forbidden into simply being impossible and yielding an undefined result.

And indeed, if a = b/0, it would mean in reverse that a×0 = b. If b itself was zero, the equation would be true for every single number there is, making it impossible to define a concrete value for a. And if b was any other value, no single value multiplied by zero could result in anything non-zero. Once we move into the realms of calculus, we will learn that infinity appears to be the answer, but that’s in the end just replacing one abstract, mind-boggling concept with another one. And it won’t answer one question: how does all this play out in a processor? Continue reading “Creating Black Holes: Division By Zero In Practice”

Jittery Back Off To Speed Up

In systems where there are multiple participants who need to interact with a shared resource some sort of concurrency protection is usually appropriate. The obvious technique is to use locking (and fun words like “mutex”) but this adds a constant performance hit as every participant needs to spend time interacting with the lock regardless of the number of other participants. It turns out this is actually a Big Problem that garners original research, but there are techniques that can yield great effect without a PhD. Years ago [Marc] posted a great walkthrough of one such method, exponential backoff with jitter, to Amazon’s AWS blog which is a great introduction to one such solution.

The blog post was written specifically to deal systems using a specific technique called optimistic concurrency control (OCC) but that doesn’t mean the advice isn’t generally applicable. OCC refers to a method where each writer checks for a write collision only after performing the write (but before committing it), which works well in scenarios where writes are relatively uncommon. [Marc] observed that even in systems where this is a safe assumption things bog down significantly when there are too many writers clamoring for attention all at once.

The traditional solution to such a problem is for each writer to pause for an exponentially increasing amount of time before trying again, but as writers come back in big groups the same problem can recur. He provides a discussion of simple modifications to that strategy which result in significantly reduced wait times for writers.

Problems like this are not especially relevant for single Arduino sensor networks, but even small groups of systems can have concurrency trouble and it’s nice to see such an accessible write up with solutions which are viable even for simple systems. Bonus points to [Marc] for posting source to his test tool online. It doesn’t require anything outside of your computer to run (no AWS required) so if you have any brainwaves about speeding up multi-writer environments it might make a nice test environment! Maybe don’t mention the blog post in your PhD applications though.