Training Bats In The Random Forest With The Confusion Matrix

When exploring the realm of Machine Learning, it’s always nice to have some real and interesting data to work with. That’s where the bats come in – they’re fascinating animals that emit very particular ultrasonic calls that can be recorded and analysed with computer software to get a fairly good idea of what species they are. When analysed with an FFT spectogram, we can see the individual call shapes very clearly.

Creating an open source classifier for bats is also potentially useful for the world outside of Machine Learning as it could not only enable us to more easily monitor bats themselves, but also the knock on effects of modern farming methods on the natural environment. Bats feed on moths and other night flying insects which themselves have been decimated in numbers. Even in the depths of the countryside here in the UK these insects are a fraction of the population that they used to be 30 years ago, but nobody seems to have monitored this decline.

So getting back to our spectograms, it would be perfectly reasonable to throw these images at a convolutional neural network (CNN) and use an image feature-recognition strategy. But I wanted to explore the depths of the mysterious Random Forest. Continue reading “Training Bats In The Random Forest With The Confusion Matrix”

Laser-Based Audio Injection On Voice-Controllable Systems

In one of the cooler hacks we’ve seen recently, a bunch of hacking academics at the University of Michigan researched the ability to flicker a laser at audible sound frequencies to see if they could remotely operate microphones simply by shining a light on them. The results are outstanding.

While most Hackers will have heard about ‘The Thing’ – a famous hack where Russian KGB agents would aim a radio transmitter at the great seal in the US embassy,  almost none of us will have thought of using lasers shined in from distant locations to hack modern audio devices such as Alexa or Google Assistant. In the name of due diligence, we checked it out on Wikipedia: ‘The Photoacoustic Effect’ , and indeed it is real – first discovered in 1880 by Alexander Bell! The pulsing light is heating the microphone element and causing it to vibrate along with the beam’s intensity. Getting long range out of such a system is a non-trivial product of telescopes, lasers, and careful alignment, but it can be made to work.

Digging deeper into the hack, we find that the actual microphone that is vulnerable is the MEMS type, such as the Knowles SPV0842LR5H. This attack is relatively easy to prevent; manufacturers would simply need to install screens to prevent light from hitting the microphones. For devices already installed in our homes, we recommend either putting a cardboard box over them or moving them away from windows where unscrupulous neighbors or KGB agents could gain access. This does make us wonder if MEMS mics are also vulnerable to radio waves.

As far as mobile phones are concerned, the researchers were able to talk into an iPhone XR at 10 metres, which means that, very possibly, anybody with a hand held ultra violet / infra red equipped flashlight could hack our phones at close range in a bar, for example. The counter-measures are simple – just stick some black electrical tape over the microphone port at the bottom of the phone. Or stay out of those dodgy bars. Continue reading “Laser-Based Audio Injection On Voice-Controllable Systems”

How Ammo Temperature Will Affect Shooting Accuracy

The last time we visited the Hackaday shooting range we were all psyched up to get the right posture, breathe correctly, lower our heart rates and squeeze the trigger at exactly the right moment that the wandering cross hairs align with the target ……. and lastly accommodate the inevitable recoil. But never did we think to check the temperature of our ammo! Ok, temperatures aren’t likely to vary that much there unless the range cat chooses to lay down on top of the ammo box, but out in the wilderness the temperatures can easily vary by up to 30 degrees, which would certainly be a problem.

If we take a quick look at what’s happening on Johnny’s Reloading Bench  we get an in depth comparison of different powders at different temperatures, with data being collected via a bullet velocity radar. If nothing else, it’s interesting just to get a peep into the mysterious world of ‘Reloading’ where every one of the tiny kernels or ‘balls’ of powder make a difference and different powders require particular primers to make them burn properly.

Just to make it clear, bullet speed makes a big difference to the trajectory, especially at long distances. For example, if the bullet were to travel at close to the speed of light, there would be almost no trajectory at all and the shooter would not have to adjust the vertical aim for distance. Normally, we have to aim upwards to hit the target:

It may be that we ‘zero in’ our sights at room temperature, but then end up actually shooting the firearm on a cold, frosty morning with cold ammo, and given what we have now learnt from the video, we could now make a small adjustment for that eventuality, depending on the particular ammo we are using. Johnny’s video is after the break:

Continue reading “How Ammo Temperature Will Affect Shooting Accuracy”

Worried About Bats In Your Belfry? A Tale Of Two Bat Detectors

As somebody who loves technology and wildlife and also needs to develop an old farmhouse, going down the bat detector rabbit hole was a journey hard to resist. Bats are ideal animals for hackers to monitor as they emit ultrasonic frequencies from their mouths and noses to communicate with each other, detect their prey and navigate their way around obstacles such as trees — all done in pitch black darkness. On a slight downside, many species just love to make their homes in derelict buildings and, being protected here in the EU, developers need to make a rigorous survey to ensure as best as possible that there are no bats roosting in the site.

Perfect habitat for bats.

Obviously, the authorities require a professional independent survey, but there’s still plenty of opportunity for hacker participation by performing a ‘pre-survey’. Finding bat roosts with DIY detectors will tell us immediately if there is a problem, and give us a head start on rethinking our plans.

As can be expected, bat detectors come in all shapes and sizes, using various electrickery techniques to make them cheaper to build or easier to use. There are four different techniques most popularly used in bat detectors.

 

  1. Heterodyne: rather like tuning a radio, pitch is reduced without slowing the call down.
  2. Time expansion: chunks of data are slowed down to human audible frequencies.
  3. Frequency division: uses a digital counter IC to divide the frequency down in real time.
  4. Full spectrum: the full acoustic spectrum is recorded as a wav file.

Fortunately, recent advances in technology have now enabled manufacturers to produce relatively cheap full spectrum devices, which give the best resolution and the best chances of identifying the actual bat species.

DIY bat detectors tend to be of the frequency division type and are great for helping spot bats emerging from buildings. An audible noise from a speaker or headphones can prompt us to confirm that the fleeting black shape that we glimpsed was actually a bat and not a moth in the foreground. I used one of these detectors in conjunction with a video recorder to confirm that a bat was indeed NOT exiting from an old chimney pot. Phew!

Continue reading “Worried About Bats In Your Belfry? A Tale Of Two Bat Detectors”

Sensor Filters For Coders

Anybody interested in building their own robot, sending spacecraft to the moon, or launching inter-continental ballistic missiles should have at least some basic filter options in their toolkit, otherwise the robot will likely wobble about erratically and the missile will miss it’s target.

What is a filter anyway? In practical terms, the filter should smooth out erratic sensor data with as little time lag, or ‘error lag’ as possible. In the case of the missile, it could travel nice and smoothly through the air, but miss it’s target because the positional data is getting processed ‘too late’. The simplest filter, that many of us will have already used, is to pause our code, take about 10 quick readings from our sensor and then calculate the mean by dividing by 10. Incredibly simple and effective as long as our machine or process is not time sensitive – perfect for a weather station temperature sensor, although wind direction is slightly more complicated. A wind vane is actually an example of a good sensor giving ‘noisy’ readings: not that the sensor itself is noisy, but that wind is inherently gusty and is constantly changing direction.

It’s a really good idea to try and model our data on some kind of computer running software that will print out graphs – I chose the Raspberry Pi and installed Jupyter Notebook running Python 3.

The photo on the left shows my test rig. There’s a PT100 probe with it’s MAX31865 break-out board, a Dallas DS18B20 and a DHT22. The shield on the Pi is a GPS shield which is currently not used. If you don’t want the hassle of setting up these probes there’s a Jupyter Notebook file that can also use the internal temp sensor in the Raspberry Pi. It’s incredibly quick and easy to get up and running.

It’s quite interesting to see the performance of the different sensors, but I quickly ended up completely mangling the data from the DS18B20 by artificially adding randomly generated noise and some very nasty data spikes to really punish the filters as much as possible. Getting the temperature data to change rapidly was effected by putting a small piece of frozen Bockwurst on top of the DS18B20 and then removing it again.

Continue reading “Sensor Filters For Coders”

AI Poised To Turn The Internet Into Gibberish

Last Thursday two lowly masters grad students, Aaron Gokaslan and Vanya Cohen managed to replicate the secretive OpenAI model and cheekily named their version OpenGPT-2. The code can be downloaded from this Google Colab page and apparently no prior experience in language modeling is required to use it. More useful might be the skills required to persuade Google to part with $50,000 worth of free cloud compute time for the training!

Research firm OpenAI released a new, ever more powerful, version of their GPT language model with 1.5 billion parameters, trained on a data-set of 8 million web pages and although it’s most entertaining use is to produce gibberish, it will inevitably also be able to produce coherent text sometime very soon. For us mere mortals, there’s a cut down version of the model hosted in the cloud and a webpage that we can visit, type in a short phrase to prompt the system, and print out a few paragraphs of fake news. We tried it with the following: “You can use an Arduino to” and the results were:

  • You can use an Arduino to remove your appendix. Use an open or closed loop, but do not use them at the same time, i.e. don’t wire your Arduino to the wall like I did. The Arduino’s are very noisy and can damage the circuit board. Be sure to use a good quality and high-quality resistor to connect them to the wall. You can also use your own antenna if you’d like.

The neural network works by taking the text that has been previously printed and using this to predict the next word sequentially. Because of it’s obscurity, our Arduino example is a pretty severe and rather unfair test on the system whilst in it’s fledgling infant state and, looking at the proper metrics, it actually performs quite well on certain standard industry test sets such as the Children’s Book Test.

Be sure to paste your own fake news into the comments below and we’ll take a vote on the one that’s most entertaining, but please keep it within the boundaries of good taste!

Whilst this is an emerging technology, somebody did get hold of it a while back and applied it to an old teleprinter!

 

Build A Fungus Foraging App With Machine Learning

As the 2019 mushroom foraging season approaches it’s timely to combine my thirst for knowledge about low level machine learning (ML) with a popular pastime that we enjoy here where I live. Just for the record, I’m not an expert on ML, and I’m simply inviting readers to follow me back down some rabbit holes that I recently explored.

But mushrooms, I do know a little bit about, so firstly, a bit about health and safety:

  • The app created should be used with extreme caution and results always confirmed by a fungus expert.
  • Always test the fungus by initially only eating a very small piece and waiting for several hours to check there is no ill effect.
  • Always wear gloves  – It’s surprisingly easy to absorb toxins through fingers.

Since this is very much an introduction to ML, there won’t be too much terminology and the emphasis will be on having fun rather than going on a deep dive. The system that I stumbled upon is called XGBoost (XGB). One of the XGB demos is for binary classification, and the data was drawn from The Audubon Society Field Guide to North American Mushrooms. Binary means that the app spits out a probability of ‘yes’ or ‘no’ and in this case it tends to give about 95% probability that a common edible mushroom (Agaricus campestris) is actually edible. 

The app asks the user 22 questions about their specimen and collates the data inputted as a series of letters separated by commas. At the end of the questionnaire, this data line is written to a file called ‘fungusFile.data’ for further processing.

XGB can not accept letters as data so they have to be mapped into ‘classic LibSVM format’ which looks like this: ‘3:218’, for each letter. Next, this XGB friendly data is split into two parts for training a model and then subsequently testing that model.

Installing XGB is relatively easy compared to higher level deep learning systems and runs well on both Linux Ubuntu 16.04 and on a Raspberry Pi. I wrote the deployment app in bash so there should not be any additional software to install. Before getting any deeper into the ML side of things, I highly advise installing XGB, running the app, and having a bit of a play with it.

Training and testing is carried out by running bash runexp.sh in the terminal and it takes less than one second to process the 8124 lines of fungal data. At the end, bash spits out a set of statistics to represent the accuracy of the training and also attempts to ‘draw’ the decision tree that XGB has devised. If we have a quick look in directory ~/xgboost/demo/binary_classification, there should now be a 0002.model file in it ready for deployment with the questionnaire.

I was interested to explore the decision tree a bit further and look at the way XGB weighted different characteristics of the fungi. I eventually got some rough visualisations working on a Python based Jupyter Notebook script:

 

 

 

 

 

 

 

Obviously this app is not going to win any Kaggle competitions since the various parameters within the software need to be carefully tuned with the help of all the different software tools available. A good place to start is to tweak the maximum depth of the tree and the number or trees used. Depth = 4 and number = 4 seems to work well for this data. Other parameters include the feature importance type, for example: gain, weight, cover, total_gain or total_cover. These can be tuned using tools such as SHAP.

Finally, this app could easily be adapted to other questionnaire based systems such as diagnosing a particular disease, or deciding whether to buy a particular stock or share in the market place.

An even more basic introduction to ML goes into the baseline theory in a bit more detail – well worth a quick look.