It’s often said that necessity of the mother of invention, but as a large portion of the projects we cover here at Hackaday can attest, curiosity has to at least be its step-mother. Not every project starts with a need, sometimes it’s just about understanding how something works. That desire we’ve all felt from time to time, when we’ve looked at some obscure piece of hardware or technology and decided that the world would be a slightly better place if we cracked it open and looked at what spilled out.
That’s precisely the feeling Eric O’Callaghan had when he looked out the window of his Philadelphia apartment a few years back and saw something unusual. Seemingly overnight, they had built an automated Indego bike sharing station right across the street. Seeing the row of light blue bicycles sitting in their electronic docks, he wondered how the system worked, and what kind of data they might be collecting. He didn’t need to rent a bike, he hadn’t even ridden one in years, but he suddenly had a strong urge to go across the street and learn as much as he could about this system.
He recently presented those findings during FOSSCON 2018 at the International House in Philadelphia, in the hopes that others might be interested in getting involved. Currently Eric is one of the only people who’s investigating the public data Indego offers, and as his personal MySQL database has now surpassed 15 million rows of data, he’s hoping to get some developers with big data experience into the fray. His approach to making this data useful is an interesting one which I’ll dive into after the break.
The Indego API
Eric started the presentation by explaining that the official “API” offered by Indego isn’t really much of an API at all, at least not in the way you’d expect. You can’t request data for a certain time, or even a particular location. When you send a request you simply receive a JSON file that includes a snapshot of all available data in the system, which currently comprises over 120 stations scattered throughout the city.
So the first step was to create a tool that takes this data and breaks it down into a more useful format. He created a PHP library for manipulating this data, and then followed up with a Python version later on. With these libraries, the user is able to filter out extraneous information and see the number of bikes available at a single location. Eric then went on to create a website which allows visitors to see, in real-time, the number of bikes available at every Indego station in the city.
In another case of curiosity driving the mind of the hacker, Eric decided he might as well start storing snapshots of this JSON data if they’re just going to be handing it out. Who knows what kind of interesting trends might show up? So he created a script which would send a request to the Indego API every 10 minutes on his personal server, and add the resulting data to a database. He can now see how many bikes were available at a certain station at any time going all the way back to 2015; a capability that he believes Indego themselves might not even have.
Historical Hacking
Whether or not Indego has historical data on bike usage is perhaps debatable, but surely there’s no other public source for much of the information Eric has collected. This revelation got a few people in attendance to start brainstorming possible exploits that this trove of information makes possible.
One person wondered if it wouldn’t be possible to compare daily data and attempt to find individual travelers. In other words, if station A always has a bike checkout at 8:45 AM, and station B always checks one in at 9:00 AM, could you assume that you’ve found the commute to work for a particular individual? This wouldn’t tell you their identity of course, but knowing someone’s schedule could be used as part of a larger social engineering attack.
Another individual in attendance pointed out that Indego has a policy where if you attempt to return a bike to a station that currently has no open spots, your ride is extended for free. Using Eric’s library, one could conceivably plot a route through the city that would bounce between full stations, continually extending your ride time. You probably couldn’t get away with it for long, but it would be an interesting experiment.
Future Work
Looking ahead, work needs to be done in documenting all the fields in the JSON response, as Eric limited his own code to the parts of the data that interest him the most (location of station and how many bikes were available). There are also issues when new stations are added, or worse, when a station has its ID number changed. Finally, new and better visualizations could be developed that help put this data to work for those who use the system.
Eric admits he’s no programmer, nor an expert on data analysis. He’s just a guy who saw something interesting outside of his apartment one day. His hope is that this sampling of what’s possible with the Indego data will inspire others with more experience to take the reins and realize the project’s full potential.
I’m thinking of starting a gofundme to build a Landmaster II.
B^)
That awesome 12 wheeled APC from Damnation Alley?
https://en.wikipedia.org/wiki/Landmaster
With the data coming from the company in such a poorly formatted way, I wonder if it’s a case that they were being forced to give the data out through some city/state transparency rules, and just wanted to do it in the easiest way possible. Or else they’re just lazy.
Could be that second one…
Good point, I’d love to know if it was for a transparency rule too as it could lead to finding similar transportation services with interesting data to glean.
Related — https://github.com/ubahnverleih/WoBike