TapType: AI-Assisted Hand Motion Tracking Using Only Accelerometers

The team from the Sensing, Interaction & Perception Lab at ETH Zürich, Switzerland have come up with TapType, an interesting text input method that relies purely on a pair of wrist-worn devices, that sense acceleration values when the wearer types on any old surface. By feeding the acceleration values from a pair of sensors on each wrist into a Bayesian inference classification type neural network which in turn feeds a traditional probabilistic language model (predictive text, to you and I) the resulting text can be input at up to 19 WPM with 0.6% average error. Expert TapTypers report speeds of up to 25 WPM, which could be quite usable.

Details are a little scarce (it is a research project, after all) but the actual hardware seems simple enough, based around the Dialog DA14695 which is a nice Cortex M33 based Bluetooth Low Energy SoC. This is an interesting device in its own right, containing a “sensor node controller” block, that is capable of handling sensor devices connected to its interfaces, independant from the main CPU. The sensor device used is the Bosch BMA456 3-axis accelerometer, which is notable for its low power consumption of a mere 150 μA.

User’s can “type” on any convenient surface.

The wristband units themselves appear to be a combination of a main PCB hosting the BLE chip and supporting circuit, connected to a flex PCB with a pair of the accelerometer devices at each end. The assembly was then slipped into a flexible wristband, likely constructed from 3D printed TPU, but we’re just guessing really, as the progression from the first embedded platform to the wearable prototype is unclear.

What is clear is that the wristband itself is just a dumb data-streaming device, and all the clever processing is performed on the connected device. Training of the system (and subsequent selection of the most accurate classifier architecture) was performed by recording volunteers “typing” on an A3 sized keyboard image, with finger movements tracked with a motion tracking camera, whilst recording the acceleration data streams from both wrists. There are a few more details in the published paper for those interested in digging into this research a little deeper.

The eagle-eyed may remember something similar from last year, from the same team, which correlated bone-conduction sensing with VR type hand tracking to generate input events inside a VR environment.

Continue reading “TapType: AI-Assisted Hand Motion Tracking Using Only Accelerometers”

Learn Sign Language Using Machine Vision

Learning a new language is a great way to exercise the mind and learn about different cultures, and it’s great to have a native speaker around to improve the learning experience. Without one it’s still possible to learn via videos, books, and software though. The task does get much more complicated when trying to learn a language that isn’t spoken, though, like American Sign Language. This project allows users to learn the ASL alphabet with the help of computer vision and some machine learning algorithms.

The build uses a computer vision model in MobileNetV2 which is trained for each sign in the ASL alphabet. A sign is shown to the user on a screen, and the user needs to demonstrate the sign to the computer in order to progress. To do this, OpenCV running on a Raspberry Pi with a PiCamera is used to analyze the frames of the user in real-time. The user is shown pictures of the correct sign, and is rewarded when the correct sign is made.

While this only works for alphabet signs in ASL currently, the team at the University of Glasgow that built this project is planning on expanding it to include other signs as well. We have seen other machines built to teach ASL in the past, like this one which relies on a specialized glove rather than computer vision.

Continue reading “Learn Sign Language Using Machine Vision”

Training Doppler Radar With Smart Watch IMUs Data For Activity Recognition

When it comes to interpreting sensor data automatically, it helps to have a large data set to assist in validating it, as well as training when it concerns machine learning (ML). Creating this data set with carefully tagged and categorized information is a long and tedious process, which is where the idea of cross-domain translations come into play, as in the case of using millimeter wave (mmWave) radar sensors to recognize activity of e.g. building occupants with the IMU2Doppler project at Smash Lab of Carnegie Mellon University.

The most commonly used sensor type when it comes to classifying especially human motion are inertial measurement units (IMU) such as accelerometers and gyroscopes, which are found in everything from smartphones to smart watches and fitness bands. For these devices it’s common to classify measurement patterns as matches a particular activity, such as walking, jogging, or brushing one’s teeth. This makes them both well-defined and very accessible.

As for why a mmWave-based Doppler radar would be preferred for monitoring e.g. building occupants is the privacy aspect compared to using cameras, and the inconvenience of equipping people with a body-worn IMU. Using Doppler radar it would theoretically be possible for people to track activities within their own home, as well as in a medical setting to ensure patients are safe, or at a gym to track one’s performance, or usage of equipment. All without the use of cameras or personal sensors. In the past, we’ve seen a similar approach that used targeted laser beams.

As promising as this sounds, at this point in time the number of activities that are recognized with reasonable accuracy (~70%) is limited to ten types. Depending on the intended application this may already be sufficient, though as the published paper notes, there is still a lot of room for growth.

Continue reading “Training Doppler Radar With Smart Watch IMUs Data For Activity Recognition”

A man performing push-ups in front of a PC

Machine Learning Helps You Get In Shape While Working A Desk Job

Humans weren’t made to sit in front of a computer all day, yet for many of us that’s how we spend a large part of our lives. Of course we all know that it’s important to get up and move around every now and then to stretch our muscles and get our blood flowing, but it’s easy to forget if you’re working towards a deadline. [Victor Sonck] thought he needed some reminders — as well as some not-so-gentle nudging — to get into the habit of doing a quick workout a few times a day.

To this end, he designed a piece of software that would lock his computer’s screen and only unlock it if he performed five push-ups. Locking the screen on his Linux box was as easy as sending a command through the network, but recognizing push-ups was a harder task for which [Victor] decided to employ machine learning. A Raspberry Pi with a webcam attached could do the trick, but the limited processing power of the Pi’s CPU might prove insufficient for processing lots of raw image data.

[Victor] therefore decided on using a Luxonis OAK-1, which is a 4K camera with a built-in machine-learning processor. It can run various kinds of image recognition systems including Blazepose, a pre-trained model that can recognize a person’s pose from an image. The OAK-1 uses this to send out a set of coordinates that describe the position of a person’s head, torso and limbs to the Raspberry Pi through a USB interface. A second machine-learning model running on the Pi then analyzes this dataset to recognize push-ups.

[Victor]’s video (embedded below) is an entertaining introduction into the world of machine-learning systems for video processing, as well as a good hands-on example of a project that results in a useful tool. If you’re interested in learning more about machine learning on small platforms, check out this 2020 Remoticon talk on machine learning on microcontrollers, or this 2019 Supercon talk about implementing machine vision on a Raspberry Pi.

Continue reading “Machine Learning Helps You Get In Shape While Working A Desk Job”

Using Statistics Instead Of Sensors

Statistics often gets a bad rap in mathematics circles for being less than concrete at best, and being downright misleading at worst. While these sentiments might ring true for things like political polling, it hides the fact that statistical methods can be put to good use in engineering systems with fantastic results. [Mark Smith], for example, has been working on an espresso machine which can make the perfect shot of coffee, and turned to one of the tools in the statistics toolbox in order to solve a problem rather than adding another sensor to his complex coffee-brewing machine.

To make espresso, steam is generated which is then forced through finely ground coffee. [Mark] found that his espresso machine was often pouring too much or too little coffee, and in order to improve his machine’s accuracy in this area he turned to the linear regression parameter R2, also known as the coefficient of determination. By using a machine learning algorithm tuned to this value, which assesses predictable variation in a data set, a computer can more easily tell when the coffee begins pouring out of the portafilter and into the espresso cup based on the pressure and water flow in the machine itself rather than using some other input such as the weight of the cup.

We have seen in the past how seriously [Mark] takes his coffee-making, and this is another step in a series of improvements he has made to his equipment. In this iteration, he has additionally produced a simulation in JupyterLab to better assist him in modeling the system and making even more accurate predictions. It’s quite a bit more effort than adding sensors, but since his espresso machine already included quite a bit of computing power it’s not too big a leap for him to make.

AI-Generated Sleep Podcast Urges You To Imagine Pleasant Nonsense

[Stavros Korokithakis] finds the experience of falling asleep to fairy tales soothing, and this has resulted in a fascinating project that indulges this desire by using machine learning to generate mildly incoherent fairy tales and read them aloud. The result is a fantastic sort of automated, machine-generated audible sleep aid. Even the logo is machine-generated!

The Deep Dreams Podcast is entirely machine-generated, including the logo.

The project leverages the natural language generation abilities of OpenAI’s GPT-3 to create fairytale-style content that is just coherent enough to sound natural, but not quite coherent enough to make a sensible plotline. The quasi-lucid, dreamlike result is perfect for urging listeners to imagine pleasant nonsense (thanks to Nathan W Pyle for that term) as they drift off to sleep.

We especially loved reading about the methods and challenges [Stavros] encountered while creating this project. For example, he talks about how there is more to a good-sounding narration than just pointing a text-to-speech engine at a wall of text and mashing “GO”. A good episode has things like strategic pauses, background music, and audio fades. That’s where pydub — a Python library for manipulating audio — came in handy. As for the speech, text-to-speech quality is beyond what it was even just a few years ago (and certainly leaps beyond machine-generated speech in the 80s) but it still took some work to settle on a voice that best suited the content, and the project gradually saw improvement.

Deep Dreams Podcast has a GitLab repository if you want to see the code that drives it all, and you can go to the podcast itself to give it a listen.

OpenCV Brings Pinch To Zoom Into The Real World

Gesture controls arrived in the public consciousness a little over a decade ago as touchpads and touchscreens became more popular. The main limitation to gesture controls, a least as far as [Norbert] is concerned, is that they can only control objects in a virtual space. He was hoping to use gestures to control a real-world object instead, and created this device which uses gestures to control an actual picture.

In this unique augmented reality device, not only is the object being controlled in the real world but the gestures are being monitored there as well, thanks to a computer vision system watching his hand which is running OpenCV. The position data is fed into an algorithm which controls a physical picture mounted on a slender robotic arm. Now, when [Norbert] “pinches to zoom”, the servo attached to the picture physically brings it closer to or further from his field of view. He can also use other gestures to move the picture around.

While this gesture-controlled machine is certainly a proof-of-concept, there are plenty of other uses for gesture controls of real-world objects. Any robotics platform could benefit from an interface like this, or even something slightly more mundane like an office PowerPoint presentation. Opportunity abounds, but if you need a primer for OpenCV take a look at this build which tracks a hand in minute detail.

Continue reading “OpenCV Brings Pinch To Zoom Into The Real World”