The average person has become depressingly comfortable with the surveillance dystopia we live in. For better or for worse, they’ve come to accept the fact that data about their lives is constantly being collected and analyzed. We’re at the point where a sizable chunk of people believe their smartphone is listening in on their personal conversations and tailoring advertisements to overheard keywords, yet it’s unlikely they’re troubled enough by the idea that they’d actually turn off the phone.
But even the most privacy-conscious among us probably wouldn’t consider our water usage to be any great secret. After all, what could anyone possibly learn from studying how much water you use? Well, as [Jason Bowling] has proven with his fascinating water-meter data research, it turns out you can learn a whole hell of a lot by watching water use patterns. By polling a whole-house water flow meter every second and running the resulting data through various machine learning algorithms, [Jason] found there is a lot of personal information hidden in this seemingly innocuous data stream.
The key is that every water-consuming device in your home has a discernible “fingerprint” that, with enough time, can be identified and tracked. Appliances that always use the same amount of water, like an ice maker or dishwasher, are obvious spikes among the noise. But [Jason] was able to pick up even more subtle differences, such as which individual toilet in the home had been flushed and when.
Further, if you watch the data long enough, you can even start to identify information about individuals within the home. Want to know how many kids are in the family? Monitoring for frequent baths that don’t fill the tub all the way would be a good start. Want to know how restful somebody’s sleep was? A count of how many times the toilet was flushed overnight could give you an idea.
In terms of the privacy implications of what [Jason] has discovered, we’re mildly horrified. Especially since we’ve already seen how utility meters can be sniffed with nothing more exotic than an RTL-SDR. But on the other hand, his write-up is a fantastic look at how you can put machine learning to work in even the most unlikely of applications. The information he’s collected on using Python to classify time series data and create visualizations will undoubtedly be of interest to anyone who’s got a big data problem they’re looking to solve.