Lambdas For C — Sort Of

A lot of programming languages these days feature lambda functions, or what I would be just as happy to call anonymous functions. Some people make a big deal out of these but the core idea is very simple. Sometimes you need a little snippet of code that you only need in one place — most commonly, as a callback function to pass another function — so why bother giving it a name? Depending on the language, there can be more to it that, especially if you get into closures and currying.

For example, in Python, the map function takes a function as an argument. Suppose you have a list and you want to capitalize each word in the list. A Python string has a capitalize method and you could write a loop to apply it to each element in the list. However, map and a lambda can do it more concisely:

map(lambda x: x.capitalize(), ['madam','im','adam'])

The anonymous function here takes an argument x and calls the capitalize method on it. The map call ensures that the anonymous function is called once for each item.

Modern C++ has lambda expressions. However, in C you have to define a function by name and pass a pointer — not a huge problem, but it can get messy if you have a lot of callback functions that you use only one time. It’s just hard to think up that many disposable function names. However, if you use gcc, there are some nonstandard C features you can use to get most of what you want out of lambda expressions.

Continue reading “Lambdas For C — Sort Of”

This Machine Learning Algorithm Is Meta

Suppose you ran a website releasing many articles per day about various topics, all following a general theme. And suppose that your website allowed for a comments section for discussion on those topics. Unless you are brand new to the Internet, you’ll also imagine that the comments section needs at least a little bit of moderation to filter out spam, off topic, or even toxic comments. If you don’t want to employ any people for this task, you could try this machine learning algorithm instead.

[Ladvien] goes through a general overview of how to set up a convolutional neural network (CNN) which can be programmed to do many things, but this one crawls a web page, gathers data, and also makes decisions regarding that data. In this case, the task is to identify toxic comments but the goal is not to achieve the sharpest sword in the comment moderator’s armory, but to learn more about how CNNs work.

Written in Python, the process outlines the code itself and how it behaves, setting up a small server to host the neural network, and finally creating the webservice. As with any machine learning, you need a reliable dataset to use for training and this one came from Wikipedia comments previously flagged by humans. Trolling nuance is thrown aside, as the example homes in on blatant insults and vulgarity.

While [Ladvien] notes that his guide isn’t meant to be comprehensive, but rather to fill in some gaps that he noticed within other guides like this, we find this to be an interesting read. He also mentioned that, in theory, this tool could be used to predict the number of comments following an article like this very one based on the language in the article. We’ll leave that one as an academic exercise for now, probably.

This Week In Security: VPN Gateways, Attacks In The Wild, VLC, And An IP Address Caper

We’ll start with more Black Hat/DEFCON news. [Meh Chang] and [Orange Tsai] from Devcore took a look at Fortinet and Pulse Secure devices, and found multiple vulnerabilities. (PDF Slides) They are publishing summaries for that research, and the summary of the Fortinet research is now available.

It’s… not great. There are multiple pre-authentication vulnerabilities, as well as what appears to be an intentional backdoor.

CVE-2018-13379 abuses an snprintf call made when requesting a different language for the device login page. Snprintf is an alternative to sprintf, but intended to prevent buffer overflows by including the maximum string length to write to the target buffer, which sounds like a good idea but can lead to malicious truncation.

The code in question looks like snprintf(s, 0x40, "/migadmin/lang/%s.json", lang);.
When loading the login page, a request is made for a language file, and the file is sent to the user. At first look, it seems that this would indeed limit the file returned to a .json file from the specified folder. Unfortunately, there is no further input validation on the request, so a language of ../../arbitrary is considered perfectly legitimate, escaping the intended folder.  This would leak arbitrary json files, but sincesnprintf doesn’t fail if it exceeds the specified length, sending a request for a lang that’s long enough results in the “.json” extension not being appended to the request either.

A metasploit module has been written to test for this vulnerability, and it requests a lang of /../../../..//////////dev/cmdb/sslvpn_websession. That’s just long enough to force the json extension to fall off the end of the string, and it is Unix convention is to ignore the extra slashes in a path. Just like that, the Fortigate is serving up any file on its filesystem just for asking nice.

More worrying than the snprintf bug is the magic value that appears to be an intentional backdoor. A simple 14 character string sent as an http query string bypasses authentication and allows changing any user’s password — without any authentication. This story is still young, it’s possible this was intended to have a benign purpose. If it’s an honest mistake, it’s a sign of incompetence. If it’s an intentional backdoor, it’s time to retire any and all Fortinet equipment you have.

Pulse Secure VPNs have a similar pre-auth arbitrary file read vulnerability. Once the full report is released, we’ll cover that as well.

Exploitation in the Wild

But wait, there’s more. Hide your kids, hide your wife. Webmin, Pulse Secure, and Fortigate are already being exploited actively in the wild, according to ZDNet. Based on reports from Bad Packets, the Webmin backdoor was being targeted in scans within a day of announcement, and exploited within three days of the announcement. There is already a botnet spreading via this backdoor. It’s estimated that there are around 29,000 vulnerable Internet-facing servers.

Both Pulse Secure and Fortinet’s Fortigate VPN appliances are also being actively targeted. Even though the vulnerabilities were reported first to the vendors, and patched well in advance of the public disclosure, thousands of vulnerable devices remain. Apparently routers and other network appliance hardware are fire-and-forget solutions, and often go without important security updates.

VLC is Actually Vulnerable This Time

The VLC media player has released a new update, fixing 11 CVEs. These CVEs are all cases of mishandling malformed media files, and are only exploitable by opening a malicious file with VLC. Be sure to go update VLC if you have it installed. Even though no arbitrary code execution has been demonstrated for any of these issues, it’s likely that it will eventually happen.

Gray Market IP Addresses

With the exhaustion of IPv4 addresses, many have begun using alternative methods to acquire address space, including the criminal element. Krebs on Security details his investigation into one such story: Residential Networking Solutions LLC (Resnet). It all started with an uptick in fraudulent transactions originating from Resnet residential IP addresses. Was this a real company, actually providing internet connectivity, or a criminal enterprise?

Secret Messages Could Be Hiding In Your Server Logs

[Ryan Flowers] writes in with a clever little hack that can allow you to hide data where nobody is going to go looking for it. By exploiting the fact that a web server will generally log all HTTP requests whether or not it’s valid, he shows how you can covertly send a message by asking the server for a carefully crafted fictitious URL.

We aren’t talking about requesting “yousuck.txt” from the server that hosts your least favorite website, either. As [Ryan] demonstrates, you can compress a text file, encode it with uuencode, and then send it line by line to the destination server with curl. He shows how the process, which he calls “CurlyTP” can be done manually on the command line, but it would be a simple matter of wrapping it up in a Bash script.

To get the message back, you just do the opposite. Use grep to find the lines in the log file that contain the encoded data, and then put them through uudecode to get the original text back. Finding the appropriate lines in the log file is made easier by prepending a prearranged keyword to the beginning of the URL requests. The keyword can be changed for each message to make things easier to keep track of.

If you’re still wondering why anyone would go through the trouble to do this, [Ryan] provides an excellent example: a covert “dead drop” where people could leave messages they’d rather not send through the usual channels. As long as the sender used a service to mask their true IP address, they could anonymously deliver messages onto the server without having to use any special software or protocol they might not have access to. Even the most restrictive firewalls and security measures aren’t likely to be scanning URLs for compressed text files.

We’ve seen web-based dead drops done with Python in the past, and even purpose built “PirateBoxes” that allow people to covertly exchange files, but we like how this method doesn’t require any special configuration on the server side. You should check your server logs, somebody might be trying to tell you something.

Take Pictures Around A Corner

One of the core lessons any physics student will come to realize is that the more you know about physics, the less intuitive it seems. Take the nature of light, for example. Is it a wave? A particle? Both? Neither? Whatever the answer to the question, scientists are at least able to exploit some of its characteristics, like its ability to bend and bounce off of obstacles. This camera, for example, is able to image a room without a direct light-of-sight as a result.

The process works by pointing a camera through an opening in the room and then strobing a laser at the exposed wall. The laser light bounces off of the wall, into the room, off of the objects on the hidden side of the room, and then back to the camera. This concept isn’t new, but the interesting thing that this group has done is lift the curtain on the image processing underpinnings. Before, the process required a research team and often the backing of the university, but this project shows off the technique using just a few lines of code.

This project’s page documents everything extensively, including all of the algorithms used for reconstructing an image of the room. And by the way, it’s not a simple 2D image, but a 3D model that the camera can capture. So there should be some good information for anyone working in the 3D modeling world as well.

Thanks to [Chris] for the tip!

Build A Fungus Foraging App With Machine Learning

As the 2019 mushroom foraging season approaches it’s timely to combine my thirst for knowledge about low level machine learning (ML) with a popular pastime that we enjoy here where I live. Just for the record, I’m not an expert on ML, and I’m simply inviting readers to follow me back down some rabbit holes that I recently explored.

But mushrooms, I do know a little bit about, so firstly, a bit about health and safety:

  • The app created should be used with extreme caution and results always confirmed by a fungus expert.
  • Always test the fungus by initially only eating a very small piece and waiting for several hours to check there is no ill effect.
  • Always wear gloves  – It’s surprisingly easy to absorb toxins through fingers.

Since this is very much an introduction to ML, there won’t be too much terminology and the emphasis will be on having fun rather than going on a deep dive. The system that I stumbled upon is called XGBoost (XGB). One of the XGB demos is for binary classification, and the data was drawn from The Audubon Society Field Guide to North American Mushrooms. Binary means that the app spits out a probability of ‘yes’ or ‘no’ and in this case it tends to give about 95% probability that a common edible mushroom (Agaricus campestris) is actually edible. 

The app asks the user 22 questions about their specimen and collates the data inputted as a series of letters separated by commas. At the end of the questionnaire, this data line is written to a file called ‘fungusFile.data’ for further processing.

XGB can not accept letters as data so they have to be mapped into ‘classic LibSVM format’ which looks like this: ‘3:218’, for each letter. Next, this XGB friendly data is split into two parts for training a model and then subsequently testing that model.

Installing XGB is relatively easy compared to higher level deep learning systems and runs well on both Linux Ubuntu 16.04 and on a Raspberry Pi. I wrote the deployment app in bash so there should not be any additional software to install. Before getting any deeper into the ML side of things, I highly advise installing XGB, running the app, and having a bit of a play with it.

Training and testing is carried out by running bash runexp.sh in the terminal and it takes less than one second to process the 8124 lines of fungal data. At the end, bash spits out a set of statistics to represent the accuracy of the training and also attempts to ‘draw’ the decision tree that XGB has devised. If we have a quick look in directory ~/xgboost/demo/binary_classification, there should now be a 0002.model file in it ready for deployment with the questionnaire.

I was interested to explore the decision tree a bit further and look at the way XGB weighted different characteristics of the fungi. I eventually got some rough visualisations working on a Python based Jupyter Notebook script:

 

 

 

 

 

 

 

Obviously this app is not going to win any Kaggle competitions since the various parameters within the software need to be carefully tuned with the help of all the different software tools available. A good place to start is to tweak the maximum depth of the tree and the number or trees used. Depth = 4 and number = 4 seems to work well for this data. Other parameters include the feature importance type, for example: gain, weight, cover, total_gain or total_cover. These can be tuned using tools such as SHAP.

Finally, this app could easily be adapted to other questionnaire based systems such as diagnosing a particular disease, or deciding whether to buy a particular stock or share in the market place.

An even more basic introduction to ML goes into the baseline theory in a bit more detail – well worth a quick look.

Dirty Tricks For 6502 Programming

We know the 6502 isn’t exactly the CPU of choice for today’s high-performance software, but with the little CPU having appeared in so many classic computers — the Apple, the KIM-1, The Commodores, to name a few — we have a real soft spot for it. [Janne] has a post detailing the eight best entries in the Commodore 64 coding competition. The goal was to draw an X on the screen using the smallest program possible. [Janne] got 56 bytes, but two entrants clocked in at 34 bytes.

In addition to the results, [Janne] also exposes the tricks people used to get these tiny programs done. Just looking at the solution in C and then 6502 assembly is instructive. Naturally, one trick is to use the existing ROM code to do tasks such as clearing the screen. But that’s just the starting point.

Continue reading “Dirty Tricks For 6502 Programming”