Xgboost

In the bustling world of bees, swarming is the ultimate game of real estate shuffle. When a hive gets too crowded or craves a change of scenery, colonies scout out swarms for a new hive. [Captain Flatus O’Flaherty] is a beekeeper trying to capture more native honey bees, and a custom LoRa-enabled capture hive helps him do that.

A catch hive, perched high and mighty, lures scouting as potential new homes. If selected, a swarm of over a thousand bees can move in, where [Flatus]’s detector comes in. Many catch hives are scattered around, and manually checking them is difficult. While the breath of one bee is hard to see, a thousand bees produce enough CO2 to be detected by a sensor. A custom PCB with a solar-powered +30dB LoRa radio measures CO2 and reports back. The PCB contains an ESP32 D4 and a 1-watt Ebyte E22-400M30S LoRa module. If the CO2 levels are still elevated at nightfall, [Flatus] can be pretty confident a swarm has moved in.

Using the data collected, he massaged it to create a dataset suitable for training on XGBoost. With weather data and other conditions, the model tries to predict when a swarm is more or less likely to happen. Apis Mellifera (the local honeybee around [Flatus]) loves sun-kissed, warm, humid afternoons with little wind.

We’ve seen beehive monitors before and love exploring what the data could be used for—video after the break.

Continue reading “2023 Hackaday Prize: Two Bee Or More Bee Swarm Detection” →

As the 2019 mushroom foraging season approaches it’s timely to combine my thirst for knowledge about low level machine learning (ML) with a popular pastime that we enjoy here where I live. Just for the record, I’m not an expert on ML, and I’m simply inviting readers to follow me back down some rabbit holes that I recently explored.

But mushrooms, I do know a little bit about, so firstly, a bit about health and safety:

The app created should be used with extreme caution and results always confirmed by a fungus expert.
Always test the fungus by initially only eating a very small piece and waiting for several hours to check there is no ill effect.
Always wear gloves – It’s surprisingly easy to absorb toxins through fingers.

Since this is very much an introduction to ML, there won’t be too much terminology and the emphasis will be on having fun rather than going on a deep dive. The system that I stumbled upon is called XGBoost (XGB). One of the XGB demos is for binary classification, and the data was drawn from The Audubon Society Field Guide to North American Mushrooms. Binary means that the app spits out a probability of ‘yes’ or ‘no’ and in this case it tends to give about 95% probability that a common edible mushroom (Agaricus campestris) is actually edible.

The app asks the user 22 questions about their specimen and collates the data inputted as a series of letters separated by commas. At the end of the questionnaire, this data line is written to a file called ‘fungusFile.data’ for further processing.

XGB can not accept letters as data so they have to be mapped into ‘classic LibSVM format’ which looks like this: ‘3:218’, for each letter. Next, this XGB friendly data is split into two parts for training a model and then subsequently testing that model.

Installing XGB is relatively easy compared to higher level deep learning systems and runs well on both Linux Ubuntu 16.04 and on a Raspberry Pi. I wrote the deployment app in bash so there should not be any additional software to install. Before getting any deeper into the ML side of things, I highly advise installing XGB, running the app, and having a bit of a play with it.

Training and testing is carried out by running bash runexp.sh in the terminal and it takes less than one second to process the 8124 lines of fungal data. At the end, bash spits out a set of statistics to represent the accuracy of the training and also attempts to ‘draw’ the decision tree that XGB has devised. If we have a quick look in directory ~/xgboost/demo/binary_classification, there should now be a 0002.model file in it ready for deployment with the questionnaire.

I was interested to explore the decision tree a bit further and look at the way XGB weighted different characteristics of the fungi. I eventually got some rough visualisations working on a Python based Jupyter Notebook script:

Obviously this app is not going to win any Kaggle competitions since the various parameters within the software need to be carefully tuned with the help of all the different software tools available. A good place to start is to tweak the maximum depth of the tree and the number or trees used. Depth = 4 and number = 4 seems to work well for this data. Other parameters include the feature importance type, for example: gain, weight, cover, total_gain or total_cover. These can be tuned using tools such as SHAP.

Finally, this app could easily be adapted to other questionnaire based systems such as diagnosing a particular disease, or deciding whether to buy a particular stock or share in the market place.

An even more basic introduction to ML goes into the baseline theory in a bit more detail – well worth a quick look.