A month ago, I’ve talked about using computers to hack on our day-to-day existence, specifically, augmenting my sense of time (or rather, lack thereof). Collecting data has been super helpful – and it’s best to automate it as much as possible. Furthermore, an augment can’t be annoying beyond the level you expect, and making it context-sensitive is important – the augment needs to understand whether it’s the right time to activate.
I want to talk about context sensitivity – it’s one of the aspects that brings us closest to the sci-fi future; currently, in some good ways and many bad ways. Your device needs to know what’s happening around it, which means that you need to give it data beyond what the augment itself is able to collect. Let me show you how you can extract fun insights from collecting data, with an example of a data source you can easily tap while on your computer, talk about implications of data collections, and why you should do it despite everything.
Started At The Workplace, Now We’re Here
Around 2018-2019, I was doing a fair bit of gig work – electronics, programming, electronics and programming, sometimes even programming and electronics. Of course, for some, I billed per hour, and I was asked to provide estimates. How many hours does it take for me to perform task X?
I decided to collect data on what I do on my computer – to make sure I can bill people as fairly as possible, and also to try and improve my estimate-making skills. Fortunately, I do a lot of my work on a laptop – surely I could monitor it very easily? Indeed, and unlike Microsoft Recall, neither LLMs nor people were harmed during this quest. What could be a proxy for “what I’m currently doing”? For a start, currently focused window names.
Thankfully, my laptop runs Linux, a hacker-friendly OS. I quickly wrote a Python script that polls the currently focused window, writing every change into a logfile, each day a new file. A fair bit of disk activity, but nothing that my SSDs can’t handle. Initially, I just let the script run 24/7, writing its silly little logs every time I Alt-Tabbed or opened a new window, checking them manually when I needed to give a client a retrospective estimate.
I Alt-Tab a lot more than I expected, while somehow staying on the task course and making progress. Also, as soon as I started trying to sort log entries into types of activity, I was quickly reminded that categorizing data is a whole project in itself – it’s no wonder big companies outsource it to the Global South for pennies. In the end, I can’t tell you a lot about data processing here, but only because I ended up not bothering with it much, thinking that I would do it One Day – and I likely will mention it later on.
Collect Data, And Usecases Will Come
Instead, over time, I came up with other uses for this data. As it ran in an always-open commandline window, I could always scroll up and see the timestamps. Of course, this meant I could keep tabs on things like my gaming habits – at least, after the fact. I fall asleep with my laptop by my side, and usually my laptop is one of the first things I check when I wake up. Quickly, I learned to scroll through the data to figure out when I went to sleep, when I woke up, and check how long I slept.
I also started tacking features on the side. One thing I added was monitoring media file playback, logging it alongside window title changes. Linux systems expose this information over Dbus, and there’s a ton of other useful stuff there too! And Dbus is way easier to work with than I’ve heard, especially when you use a GUI explorer like D-Feet to help you learn the ropes.
The original idea was figuring out how much time I was spending actively watching YouTube videos, as opposed to watching them passively in the background, and trying to notice trends. Another idea was to keep an independent YouTube watch history, since the YouTube-integrated one is notoriously unreliable. I never actually did either of these, but the data is there whenever I feel the need to do so.
Of course, having the main loop modifiable meant that I could add some hardcoded on-window-switch actions, too. For instance, at some point I was participating in a Discord community and I had trouble remembering a particular community rule. No big deal – I programmed the script to show me a notification whenever I switched into that server, reminding me of the rule.
There is no shortage of information you can extract even from this simple data source. How much time do I spend talking to friends, and at which points in the day; how does that relate to my level of well-being? When I spend all-nighters on a project, how does the work graph look? Am I crashing by getting distracted into something unrelated, not asleep, but too sleepy to get up and get myself to bed? Can I estimate my focus levels at any point simply by measuring my Alt-Tab-bing frequency, then perhaps, measure my typing speed alongside and plot them together on a graph?
Window title switches turned out to be a decent proxy for “what I’m currently doing with my computer”. Plus, it gives me a wonderful hook, of the “if I do X, I need to remember to do Y” variety – there can never be enough of those! Moreover, it provides me with sizeable amounts of data about myself, data that I now store. Some of you will be iffy about collecting such data – there are some good reasons for it.
Taking Back Power
We emit information just like we emit heat. As long as we are alive, there’s always something being digitized; even your shed in the woods is being observed by a spy satellite. The Internet revolution has made information emissivity increase exponentially, a widespread phenomenon it now uses to grow itself, since now your data pays for online articles, songs, and YouTube videos. Now there are entire databanks containing various small parts of your personality, way more than you could ever have been theoretically comfortable with, enough to track your moves before you’re aware you’re making them.
Cloning is not yet here, but Internet already contains your clone – it can sure answer your security questions to your bank, with a fair bit of your voice to impersonate you while doing so, and not to mention all the little tidbits used to sway your purchase power and voting preferences alike. When it comes to protections, all we have is pretenses like “privacy policies” and “data anonymization”. EU is trying to move in the right direction through directives like GDPR, with Snowden discoveries having left a deep mark, but it’s barely enough and not a consistent trend.
Just like with heat signatures, not taking care of your information signature gives you zero advantages and a formidable threat profile, but if you are tapped into it, you can protect people – or preserve dictatorships. Now, if anyone deserves to have power over yourself, it’s you, as opposed to an algorithm currently tracking your toilet paper purchases, which might be used tomorrow to catch weed smokers when it notices an increase in late night snack runs. It’s already likely to be used to ramp up prices during an emergency, or just because of increased demand – that’s where all these e-ink pricetags come into play!
Isn’t It Ridiculous?
Your data will be collected by others no matter your preference, and it will not be shared with you, so you have to collect it yourself. Once you have it, you can use your data to understand yourself better, become stronger by compensating for your weaknesses, help you build healthier relationships with others, living a more fulfilling and fun life overall. Collecting data also means knowing what others might collect and the power it provides, and tyis can help you fight and offset the damage you are bound to suffer because of datamining. Why are we not doing more of this, again?
We’ve got a lot to catch up to. Our conversations can get recorded with the ever-present networked microphones and then datamined, but you don’t get a transcript of that one phonecall where you made a doctor’s appointment and forgot to note the appointment time. Your store knows how often you buy toilet paper, what’s with these loyalty cards we use to get discounts while linking our purchases to our identities, but they are not kind enough to send you a notification saying it might be time to restock. Ever looked back on a roadtrip you did and wished you had a GPS track saved? Your telco operators know your location well enough, now even better with 5G towers, but you won’t get a log. Oh, also, your data can benefit us all, in a non-creepy way.
Unlike police departments, scientists are bound by ethics codes and can’t just buy data without the data owner’s consent – but science and scientific research is where our data could seriously shine. In fact, scientific research thrives when we can provide it with data we collected – just look at Apple Health. In particular, social sciences could really use a boost in available data, as reproducibility crises have no end in sight – research does turn out to skew a certain way when your survey respondents are other social science students.
Grab the power that you’re owed, collect your own data, store it safely, and see where it gets you – you will find good uses for it, whether it’s self-improvement, scientific research, or just building a motorized rolling chair that brings you to your bed as it notices you become too tired after hacking all night throughout. Speaking of which, my clock tells me it’s 5 AM.
Works, Helps, Grows
The code is on GitHub, for whatever purposes. This kind of program is a useful data source, and you could add it into other things you might want to build. This year, I slapped some websocket server code over the window monitoring code – now, other programs on my computer can connect to the websocket server, listen to messages, making decisions based on my currently open windows and currently playing media. If you want to start tracking your computer activity right now, there are some promising programs you should consider – ActivityWatch looks really nice in particular.
I have plans for computer activity tracking beyond today – from tracking typing on the keyboard, to condensing this data into ongoing activity summaries. When storing data you collect, make sure you include a version number from the start and increment it on every data format change. You will improve upon your data formats and you will want to parse them all, and you’ll be thankful for having a version number to refer to.
The GitHub-published portion is currently being used for a bigger project, where the window monitoring code plays a crucial part. Specifically, I wanted to write a companion program that would help me stay on track when working on specific projects on my laptop. In a week’s time, I will show you that program, talk about how I’ve come to create it and how it hooks into my brain, how much it helps me in the end, share the code, and give you yet another heap of cool things I’ve learned.
What other kinds of data could one collect?
Start measuring life and it becomes an chore.
Measure what matters.
If your life matters, measure it.
John Walker, the founder of AutoDesk, wrote the “The Hacker’s Diet”. It’s a fairly basic low calorie and increased exercise regimen which takes willpower and discipline. Nothing new there. However he takes the viewpoint of a geek/nerd dealing with a machine which has a broken feedback mechanism. The key to compensating for the broken feedback is external data collection and analysis. Your body takes time to respond to changes. The day-to-day numbers can be misleading so you need data over a long enough period to filter out the noise and see the long term effects of those changes.
In these articles, I specifically argue against data collection becoming a chore, in large part because I can’t keep up with my chores already =D Ain’t nothing chore-like about a script running in the background giving me cool data, and I firmly aim to uphold the same ease of use for every new concept I try out.
Years ago my Samsung Galaxy 4(?) had a Health app that allowed me to record my steps, heartbeat, time on the elliptical, and allowed me to enter my weight and BP.
Then they “updated” the app during an Android update. The new app wanted to connect all information to “the cloud”, so I never used it again.
We launched an app to make personal data collection a little easier. Ties into as many iOS sensor APIs as we could and lets you serve little surveys that are randomly presented to you. Still works great: http://reporter-app.com
Here in my area (S.E. USA) recently rent prices have been rapidly increasing, more so than expected statistically due to supply and demand. It turns out landlords have been subscribing to data mining apps about rent prices segregated by geospatial regions, then using the data to see how much they can raise rents before getting diminishing returns due to attrition (renters being priced out of the market). Data: The more you collect, the more you attract pure EVIL!
yeah one of the big scandals in USA right now. Thing is, I don’t even know that we can stop this from happening long-term, I don’t see how. I do see that collecting data on yourself is a huge benefit, however.
Whenever I’m asked if I have the fidelity card, my answer’s always :« No, I only have fidelity for my wife ;^) », and the cashier always smile at me…
Is the data collecting code still available anywhere? The article says “The code is on GitHub,” but I don’t see an obvious link to that or a repository.
Indeed, i missed that as well.
A quick question to the “all-seeing. all-knowing garbage heap” for Arya Voronova github found https://github.com/CRImier?tab=repositories, but there, i’m lost, didn’t find it.
Like @recook I couldn’t find Arya’s code on GitHub, but this article did inspire me to try ActivityWatch. Looks great so far!