Mining the wealth of information in the BitCoin blockchain is nothing new, but BitCluster goes a long way to make sense of the information you’ll find there. The tool was released by Mathieu Lavoie and David Decary-Hetu, PH.D. on Friday following their talk at HOPE XI.
I greatly enjoyed sitting in on the talk which began with some BitCoin basics. The cryptocurrency uses user generated “wallets” which are essentially addresses that identify transactions. Each is established using key pairs and there are roughly 146 million of these wallets in existence now
If you’re a thrifty person you might think you can get one wallet and use it for years. That might be true of the sweaty alligator-skin nightmare you’ve had in your back pocket for a decade now. It’s not true when it comes to digital bits — they’re cheap (some would say free). People who don’t generate a new wallet for every transaction weaken their BitCoin anonymity and this weakness is the core of BitCluster’s approach.
Every time you transfer BitCoin (BTC) you send the network the address of the transaction when you acquired the BTCs and sign it with your key to validate the data. If you reuse the same wallet address on subsequent transactions — maybe because you didn’t spend all of the wallet’s coins in one transaction or you overpaid and have the change routed back to your wallet. The uniqueness of that signed address can be tracked across those multiple transactions. This alone won’t dox you, but does allow a clever piece of software to build a database of nodes by associating transactions together.
Mathieu’s description of first attempts at mapping the blockchain were amusing. The demonstration showed a Python script called from the command line which started off analyzing a little more than a block a second but by the fourth or fifth blocks hit the process had slowed to a standstill that would never progress. This reminds me of some of the puzzles from Project Euler.
After a rabbit hole of optimizations the problem has been solved. All you need to recreate the work is a pair of machines (one for Python one for mondoDB) with the fastest processors you can afford, a 500 GB SSD, 32 GB of RAM (but would be 64 better), Python 64-bit, and at least a week of time. The good news is that you don’t have to recreate this. The 200GB database is available for download through a torrent and the code to navigate it is up on GitHub. Like I said, this type of blockchain sleuthing isn’t new but a powerful open source tool like this is.
Both Ransomware and illicit markets can be observed using this technique. Successful, yet not-so-cautious ransomers sometimes use the same BitCoin address for all payments. For example, research into a 2014 data sample turned up a ransomware instance that pulled in $611k (averaging $10k per day but actually pulling in most of the money during one three-week period). If you’re paying attention you know using the same wallet address is a bad move and this ransomware was eventually shut down.
Illicit markets like Silk Road are another application for BitCluster. Prior research methods relied on mining comments left by customers to estimate revenue. Imagine if you had to guess at how well Amazon was doing reading customer reviews and hoping they mentioned the price? The ability to observe BTC payment nodes is a much more powerful method.
A good illicit market won’t use just one wallet address. But to protect customers they use escrow address and these do get reused making cluster analysis possible. Silk Road was doing about $800k per month in revenue at its height. The bulk of purchases were for less than $500 with only a tiny percentage above $1000. But those large purchases were likely to be drug purchases of a kilo or more. That small sliver of total transactions actually added up to about a third of the total revenue.
It’s fascinating to peer into transactions in this manner. And the good news is that there’s plenty of interesting stuff just waiting to be discovered. After all, the blockchain is a historical record so the data isn’t going anywhere. BitCluster is intriguing and worth playing with. Currently you can search for a BTC address and see total BTC in and out, then sift through income and expense sorted by date, amount, etc. But the tool can be truly great with more development. On the top of the wishlist are automated database updates, labeling of nodes (so you can search “Silk Road” instead of a numerical address), visual graphs of flows, and a hosted version of the query tool (but computing power becomes prohibitive.)
Mike there is a typo in the name of “David Decarty-Hetu” . It should be “David Decary-Hetu”
My apology to David. Thanks for mention this error, fixed!
The author is confused about what a wallet is and what an address is. A wallet is not an address. A wallet contains private keys that are associated with addresses. The statement, “People who don’t generate a new wallet for every transaction…”, and the use of the phrase “wallet address” demonstrates the author’s confusion.
I am surprised this is a new idea. Considering how many high profile ransonware attacks reported in the news, I always assumed this was a common technique used by law enforcement to investigate these crimes.
I actually don’t understand how these criminals can get away so easily considering the public nature of the blockchain. We can all see the address the ransom was paid to and we can track all the transactions in and out of that address. If dealing with a large sum, I imagine it would be too inconvenient to piece meal the process of converting the coins to hard cash. So I imagine at some point there is a large transaction that converts the coins to cash. If that happens through an exchange, it will leave a record that leads straight to the person who received the proceeds. Why is it so easy for these criminals to evade the police?
The FBI (and equivalent agencies elsewhere) probably could catch some of the criminals, but they mostly choose not to, which isn’t anything new. International investigations are hard, and criminals like to operate out of countries that are particularly difficult to deal with. They have to decide what is a priority. Computer criminals just have to make the tracking process tedious and expensive enough that the FBI simply gives up.
Currently the priority seems to be terrorism, with copyright in a distant second place (maybe I’m being too cynical, but those are the only two I’ve heard about lately). There are probably more important things than going after ransomware (human trafficking, other organized crime), but IMO they should at least be going after real, concrete theft from ordinary people online instead of perceived “theft” from copyright infringement online like they often seem.
Please disagree if I’m wrong. If my perception of the current state of affairs is warped I’d like to hear about it (especially if anybody has statistics to back it up).
If you are a savvy criminal you can make BTC very difficult to trace by splitting the funds and buying other crypto currencies and trading them back into BTC then cashing out. Another way is to use a BTC tumbler for a % fee, where your coins are sent to a communal wallet, are then split and moved across multiple addresses many times and then returned to you effectively ‘clean’. Just like with physical money there is always a way to disguise your dodgy dealings.
On this wave, years ago [Michele Spagnuolo] released it’s bitIodine thesis and online tool, to clusterize btc addresses: https://bitiodine.net/
https://miki.it/pdf/BitIodine_presentation.pdf
https://www.ifca.ai/fc14/papers/fc14_submission_11.pdf
In 2013-2014 I developed ground_up a C clusterizer & analyzer which works mainly in RAM, so the whole btc blockchain is clusterized in a few minutes on my 16GB laptop, but I never completed the tool with a web ui, and I never released it :-(
Piero
Ya, but for wallets that generate new addresses automatically with each transaction (like samurai, dark wallet, and even bigger providers like circle), I can only imagine this technique being but so effective… At least, if the key variable in this software’s effectiveness is whether or not the person you wanted to track was lazy and kept using the same address for the same wallet. Sure, helps with the silk roads of the world, but there is a whole lot of crime that happens without silk road as well.