When Good Software Goes Bad: Malware In Open Source

Open Source software is always trustworthy, right? [Bertus] broke a story about a malicious Python package called “Colourama”. When used, it secretly installs a VBscript that watches the system clipboard for a Bitcoin address, and replaces that address with a hardcoded one. Essentially this plugin attempts to redirects Bitcoin payments to whoever wrote the “colourama” library.

Why would anyone install this thing? There is a legitimate package named “Colorama” that takes ANSI color commands, and translates them to the Windows terminal. It’s a fairly popular library, but more importantly, the name contains a word with multiple spellings. If you ask a friend to recommend a color library and she says “coulourama” with a British accent, you might just spell it that way. So the attack is simple: copy the original project’s code into a new misspelled project, and add a nasty surprise.

Sneaking malicious software into existing codebases isn’t new, and this particular cheap and easy attack vector has a name: “typo-squatting”.  But how did this package get hosted on PyPi, the main source of community contributed goodness for Python? How many of you have downloaded packages from PyPi without looking through all of the source? pip install colorama? We’d guess that it’s nearly all of us who use Python.

It’s not just Python, either. A similar issue was found on the NPM javascript repository in 2017. A user submitted a handful of new packages, all typo-squatting on existing, popular packages. Each package contained malicious code that grabbed environment variables and uploaded them to the author. How many web devs installed these packages in a hurry?

Of course, this problem isn’t unique to open source. “Abstractism” was a game hosted on Steam, until it was discovered to be mining Monero while gamers were playing. There are plenty of other examples of malicious software masquerading as something else– a sizable chunk of my day job is cleaning up computers after someone tried to download Flash Player from a shady website.

Buyer Beware

In the open source world, we’ve become accustomed to simply downloading libraries that purport to do exactly the cool thing we’re looking for, and none of us have the time to pore through the code line by line. How can you trust them?

Repositories like PyPi do a good job of faithfully packaging the libraries and programs that are submitted to them. As the size of these repositories grow, it becomes less and less practical for every package to be manually reviewed. PyPi lists 156,750 projeccts. Automated scanning like [Bertus] was doing is a great step towards keeping malicious code out of our repositories. Indeed, [Bertus] has found eleven other malicious packages while testing the PyPi repository. But cleverer hackers will probably find their way around automated testing.

That the libraries are open source does add an extra layer of reliability, because the code can in principal be audited by anyone, anytime. As libraries are used, bugs are found, and features are added, more and more people are intentionally and unintentionally reviewing the code. In the “colourama” example, a long Base64 string was decoded and executed. It doesn’t take a professional researcher to realize something fishy is going on. At some point, enough people have reviewed a codebase that it can be reasonably trusted. “Colorama” has well over a thousand stars on Github, and 28 contributors. But did you check that before downloading it?

Typo-squatting abuses trust, taking advantage of a similar name and whoever isn’t paying quite close enough attention. It’s not practical for every user to check every package in their operating system. How, then, do we have any trust in any install? Cryptography solves some of these problems, but it cannot overcome the human element. A typo in a url, trusting a brand new project, or even obfuscated C code can fool the best of us from time to time.

What’s the solution? How do we have any confidence in any of our software? When downloading from the web, there are some good habits that go a long way to protect against attacks. Cross check that the project’s website and source code actually point to each other. Check for typos in URLs. Don’t trust a download just because it’s located on a popular repository.

But most importantly, check the project’s reputation, the number of contributors to the project, and maybe even their reputation. You wouldn’t order something on eBay without checking the seller’s feedback, would you? Do the same for software libraries.

A further layer of security can be found in using libraries supported by popular distributions. In quality distributions, each package has a maintainer that is familiar with the project being maintained. While they aren’t checking each line of code of every project, they are ensuring that “colorama” gets packaged instead of “colourama”. In contrast to PyPi’s 156,750 Python modules, Fedora packages only around 4,000. This selection is a good thing.

Repositories like PyPi and NPM are simply not the carefully curated sources of trustworthy software that we sometimes think them to be– and we should act accordingly. Look carefully into the project’s reputation. If the library is packaged by your distribution of choice, you can probably pass this job off to the distribution’s maintainers.

At the end of the day, short of going through the code line by line, some trust anchor is necessary. If you’re blindly installing random libraries, even from a “trustworthy” repository, you’re letting your guard down.

Jump Into AI With A Neural Network Of Your Own

One of the difficulties in learning about neural networks is finding a problem that is complex enough to be instructive but not so complex as to impede learning. [ThomasNield] had an idea: Create a neural network to learn if you should put a light or dark font on a particular colored background. He has a great video explaining it all (see below) and code in Kotlin.

[Thomas] is very interested in optimization, so his approach is very much based on mathematics and algorithms of optimization. One thing that’s handy is that there is already an algorithm for making this determination. He found it on Stack Exchange, but we’re sure it’s in a textbook or paper somewhere. The existing algorithm makes the neural network really impractical, but it makes training easy since you can algorithmically develop a training set of data.

Once trained, the neural network works well. He wrote a small GUI and you can even select among various models.

Don’t let the Kotlin put you off. It is a derivative of Java and uses the same JVM. The code is very similar, other than it infers types and also adds functional program tools. However, the libraries and the principles employed will work with Java and, in many cases, the concepts will apply no matter what you are doing.

If you want to hardware accelerate your neural networks, there’s a stick for that. If you prefer C and you want something lean and mean, try TINN.

Continue reading “Jump Into AI With A Neural Network Of Your Own”

Easy Access Point Configuration On ESP8266

One of the biggest advantages of using the ESP8266 in your projects is how easy it is to get WiFi up and running. Just plug in the WiFi library, put the SSID and encryption key in your source code, and away you go. It authenticates with your network in seconds and you can get on with building your project. But things get a little trickier if you want to take your project someplace else, or distribute your source code to others. Quickly we learn the downside of using static variables for authentication.

While there are already a few solutions to this problem out there, [Martin Raynsford] wasn’t too thrilled with them. Usually they put the ESP8266 in Access Point mode, allow the user to connect, and then ask which network they should authenticate with. But he didn’t want his projects to require an existing network, and figured he could do just as well making a field-configurable AP.

Using it is simple. Once the ESP8266 starts up it will create a new network in the form of “APConfig XXXXXX”, which should be easy enough to find from your client side device. Once connected, you can go to a simple administration page which allows you to configure a new AP name and encryption key. You even have the option to create an open AP by leaving the “Password” field blank. Once rebooted, the ESP8266 will create a new network with the defined parameters.

[Martin] has also included a “backdoor” to let anyone with physical access to the ESP8266 board create a new open AP that can be used to reconfigure the network settings. During boot up there is a brief period, indicated with specific blinks of the LED, wherein you can hit the reset button and trigger the open AP. This keeps you from getting locked out of your own project if you forget what key you gave it.

If you’re not one to go the austere route, take a look at some of the more robust solutions we’ve seen for easier end-user setup of the ESP8266.

Bitcoin’s Double Spending Flaw Was Hush-Hush During Rollout

For a little while it was possible to spend Bitcoin twice. Think of it like a coin on a string, you put it into the vending machine to get a delicious snack, but if you pull the string quickly enough you could spend it again on some soda too. Except this coin is worth something like eighty-grand.

On September 20, the full details of the latest fix for the Bitcoin Core were published. This information came two days after the fix was actually released. Two vulnerabilities were involved; a Denial of Service vulnerability and a critical inflation vulnerability, both covered in CVE-2018-17144. These were originally reported to several developers working on Bitcoin Core, as well as projects supporting other cryptocurrencies, including ABC and Unlimited.

Let’s take a look at how this worked, and how the network was patched (while being kept quiet) to close up this vulnerability.

Continue reading “Bitcoin’s Double Spending Flaw Was Hush-Hush During Rollout”

Can You “Take Back” Open Source Code?

It seems a simple enough concept for anyone who’s spent some time hacking on open source code: once you release something as open source, it’s open for good. Sure the developer might decide that future versions of the project close up the source, it’s been known to happen occasionally, but what’s already out there publicly can never be recalled. The Internet doesn’t have a “Delete” button, and once you’ve published your source code and let potentially millions of people download it, there’s no putting the Genie back in the bottle.

But what happens if there are extenuating circumstances? What if the project turns into something you no longer want to be a part of? Perhaps you submitted your code to a project with a specific understanding of how it was to be used, and then the rules changed. Or maybe you’ve been personally banned from a project, and yet the maintainers of said project have no problem letting your sizable code contributions stick around even after you’ve been kicked to the curb?

Due to what some perceive as a forced change in the Linux Code of Conduct, these are the questions being asked by some of the developers of the world’s preeminent open source project. It’s a situation which the open source community has rarely had to deal with, and certainly never on a project of this magnitude.

Is it truly possible to “take back” source code submitted to a project that’s released under a free and open source license such as the GPL? If so, what are the ramifications? What happens if it’s determined that the literally billions of devices running the Linux kernel are doing so in violation of a single developer’s copyright? These questions are of grave importance to the Internet and arguably our way of life. But the answers aren’t as easy to come by as you might think.

Continue reading “Can You “Take Back” Open Source Code?”

Nim Writes C Code — And More — For You

When we first heard Nim, we thought about the game. In this case, though, nim is a programming language. Sure, we need another programming language, right? But Nim is a bit different. It is not only cross-platform, but instead of targeting assembly language or machine code, it targets other languages. So a Nim program can wind up compiled by C or interpreted by JavaScript or even compiled by Objective C. On top of that, it generates very efficient code with — at least potentially — low overhead. Check out [Steve Kellock’s] quick introduction to the language.

The fact that it can target different compiler backends means it can support your PC or your Mac or your Raspberry Pi. Thanks to the JavaScript option, it can even target your browser. If you read [Steve’s] post he shows how a simple Hello World program can wind up at under 50K. Of course, that’s nothing the C compiler can’t do which makes sense because the C compiler is actually generating the finished executable, It is a bit harder though to strip out all the overhead yourself.

Continue reading “Nim Writes C Code — And More — For You”

One Man’s Disenchantment With The World Of Software

There is a widely derided quote attributed to [Bill Gates], that “640k should be enough for anyone”. Meaning of course that the 640 kb memory limit for the original IBM PC of the early 1980s should be plenty for the software of the day, and there was no need at the time for memory expansions or upgrades. Coupled with the man whose company then spent the next few decades dominating the software industry with ever more demanding products that required successive generations of ever more powerful PCs, it was the source of much 1990s-era dark IT humour.

XKCD no. 303 (CC BY-NC 2.5)
XKCD no. 303 (CC BY-NC 2.5)

In 2018 we have unimaginably powerful computers, but to a large extent most of us do surprisingly similar work with them that we did ten, twenty, or even thirty years ago. Web browsers may have morphed from hypertext layout formatting to complete virtual computing environments, but a word processor, a text editor, or an image editor would be very recognisable to our former selves. If we arrived in a time machine from 1987 though we’d be shocked at how bloated and slow those equivalent applications are on what would seem to us like supercomputers.

[Nikita Prokopov] has written an extremely pithy essay on this subject in which he asks why it is that if a DOS 286 could run a fast and nimble text editor, the 2018 text editor requires hundreds of megabytes to run and is noticeably slow. Smug vi-on-hand-rolled GNU/Linux users will be queuing up to rub their hands in glee in the comments, but though Windows may spring to mind for most examples there is no mainstream platform that is immune. Web applications come under particular scorn, with single pages having more bloat than the entirety of Windows 95, and flagship applications that routinely throw continuous Javascript errors being the norm. He ends with a manifesto, urging developers to do better, and engineers to call it out where necessary.

If you’ve ever railed at bloatware and simply at poor quality software in general, then [Nikita]’s rant is for you. We suspect he will be preaching to the converted.

Windows error screen: Oops4321 [CC BY-SA 4.0]