[Kevin Norman] got himself a smart body scale with the intention of logging data for his own analysis, but discovered that extracting data from the device was anything but easy. It turns out that the only way to access data from his scale is by viewing it in a mobile app. Screen-scraping is a time-honored method of pulling data from uncooperative systems, so [Kevin] committed to regularly taking a full-height screenshot from the app and using optical character recognition (OCR) to get the numbers, but making that work was a surprisingly long process full of dead ends.
First of all, while OCR can be reliable, it needs the right conditions. One thing that ended up being a big problem was the way the app appends units (kg, %) after the numbers. Not only are they tucked in very close, but they’re about half the height of the numbers themselves. It turns out that mixing and matching character height, in addition to snugging them up against one another, is something tailor-made to give OCR reliability problems.
The solution for this particular issue came from an unexpected angle. [Kevin] was using an open-source OCR program called Tesseract, and joined an IRC community #tesseract
to ask for advice after exhausting his own options. The bemused members of the online community informed [Kevin] that they had nothing to do with OCR; #tesseract
was actually a community for an open-source 3D FPS shooter of the same name. But as luck would have it, one of the members actually had OCR experience and suggested the winning approach: pre-process the image with OpenCV, using cv2.findContours()
to detect and create a bounding box around each element. If an element is taller than a decimal point but shorter than everything else, throw it out. With that done, there were still a few more tweaks required, but the finish line was finally in sight.
Now [Kevin] can use the scale in the morning, take a screenshot, and in less than half a minute the results are imported into a database and visualizations generated. The resulting workflow might look like something Rube Goldberg would approve of, but it works!
why won’t he used bluetooth and rpi or something? it is easy to get data from btle devices
I’d just run all this stuff in python on the home server. That’d at least bring it all in-house and thus improve reliability.
if you have to get data off the phone, just run the OCR locally there and send off the text…
Secondly, i’d talk tot the guys from openscale and see if that data can’t be decoded. If an app can access it, so can you. I sincerely doubt that the data is encrypted, but even if it where, the decryption key is stored in the app.
Though some crazyness could be happening ng here. Scale is internet connected and encrypts connection. Or scale sends data to app, app sends encrypted data to server and then receives decrypted data … People do crazy shit nowadays…
Problem is it might be doing the “smart” processing on the device, if you wanted to do the bmi calculations or stuff like burned calories (don’t know what’s so special about a scale that you’d need a whole app for it), you could just get a cheaper digital scale and send that data directly into a pi to record and run the calculations, even connect that back up to a local host to show off your weight on your phone for twitter.
If it’s a bluetooth scale, you could also use the existing system, which I would agree would be unencrypted or easy to spoof (probably) and collect it wirelessly. Running the OCR locally or getting rid of it entirely would save a lot of data and points of failure.
“But as luck would have it, one of the members actually had OCR experience and suggested the winning approach: pre-process the image with OpenCV”
Serendipity!
The problem is that he is using kg instead of lbs.
By using lbs, he has larger numbers with smaller divisions (oz).
That makes even tiny changes look bigger, giving more positive feedback on the diet!
B^)
Nicely done!
You can do a few things like convert it to B&W, threshold on black, then “erode” the image to most of the non numbers disappear, then “dilate” again and the do OCR,
I’ve used this same approach to read values from a 7-segment LED display at pretty high FPS
https://www.linkedin.com/pulse/when-88mph-really-koos-du-preez/
Or, since the screenshot probably have the exact same layout every time, apply a mask which isolates each field.
What I am seeing is mostly black text on a white background. If one can box in the text, the unit (kg, %) are always fixed size so they can be trimmed off using standard tools.
Having said that, I would first try to intercept the communication between App and cloud and see if there is anything to replay it. It most likely is some REST API. If I could setup a MITM attack to decrypt the API, the rest would be easy and the beauty is to do away with the phone completely.
You can’t say “Rube Goldberg” without mentioning “Heath Robinson”
I can say “Baby Ruth” without saying “Heath Bar”!
Wow, that Tesseract game looks quite nice! The FPS part of it looks quite standard and maybe even boring, but the in-game level editing is an awesome idea. I’m going to have to give it a try.
On the one hand, this seems like absolutely the wrong solution. On the other hand, it works.
If the screenshot always has the digits aligned in the same place, it seems like even easier preprocessing would do: just crop out digits or paint out everything else using ImageMagick or something similar.
bad hack, use nRF
I probably would have just hacked the protocol, but as others have mentioned, there’s probably some computation happening in the app that has some value, and hey: there’s nothing more permanent than a temporary solution that works 😁 Quite a creative hack to own your own data!
How can a body scale tell that a BMI of 25.4 is high without knowing bone density and other parameters?
I’d rather give that electronic “health” terrorist to someone I don’t like and get a cheap offline scale.
Some scales have electrical contacts for the feet which help to determine % of body fat.
Also, the app may require that one enters their height manually to determine BMI.
I didn’t see any brand mentioned in post, but would recommend checking GitHub for anyone who may have reverse engineered the API.
(For example, here is information for WeightGurus.)[https://gist.github.com/MarkWalters-pw/08ea0e8737e3e4d11f70427ef8fdc7df]