Audio Eavesdropping Exploit Might Make That Clicky Keyboard Less Cool

Despite their claims of innocence, we all know that the big tech firms are listening to us. How else to explain the sudden appearance of ads related to something we’ve only ever spoken about, seemingly in private but always in range of a phone or smart speaker? And don’t give us any of that fancy “confirmation bias” talk — we all know what’s really going on.

And now, to make matters worse, it turns out that just listening to your keyboard clicks could be enough to decode what’s being typed. To be clear, [Georgi Gerganov]’s “KeyTap3” exploit does not use any of the usual RF-based methods we’ve seen for exfiltrating data from keyboards on air-gapped machines. Rather, it uses just a standard microphone to capture audio while typing, building a cluster map of the clicks with similar sounds. By analyzing the clusters against the statistical likelihood of certain sequences of characters appearing together — the algorithm currently assumes standard English, and works best on clicky mechanical keyboards — a reasonable approximation of the original keypresses can be reconstructed.

If you’d like to see it in action, check out the video below, which shows the algorithm doing a pretty good job decoding text typed on an unplugged keyboard. Or, try it yourself — the link above implements KeyTap3 in-browser. We gave it a shot, but as a member of the non-mechanical keyboard underclass, it couldn’t make sense of the mushy sounds it heard. Then again, our keyboard inferiority affords us some level of protection from the exploit, so there’s that.

Editors Note: Just tried it on a mechanical keyboard with Cherry MX Blue switches and it couldn’t make heads or tails of what was typed, so your mileage may vary. Let us know if it worked for you in the comments.

What strikes us about this is that it would be super simple to deploy an exploit like this. Most side-channel attacks require such a contrived scenario for installing the exploit that just breaking in and stealing the computer would be easier. All KeyTap needs is a covert audio recording, and the deed is done.

45 thoughts on “Audio Eavesdropping Exploit Might Make That Clicky Keyboard Less Cool

      1. No, no, this is not my paper. The authors also provided the code, but after 17 years the website is long gone. And I’m pretty sure there is even older literature about this niche research domain. It is just hard to find a new research domain: even the idea of “cow speech recognition” is covered since a long time :-D

        1. It has been 23 years since I moved to this location, I recall hearing about this tactic years before.
          It might have been a warning by a TLA agency to their employees when typing while on the phone.

  1. Not a surprise somebody has tried it – for a very very long time its been known that keyboard keys can sound distinct enough from each other that its possible to reconstruct what was typed, adding in some standard frequency analysis so you don’t need to know the keyboard in advance isn’t a shocker. But like any statistical analysis its only going to work on a big enough dataset that conforms to the norm you expect – and ‘standard’ English is a bit of a misnomer really, no such thing – standard to which geographical area!

    (While English is relatively universal so some keys should fit the pattern well even in smaller dataset there is more than enough variation for skewing similarly common letters around, and variations in spelling and grammar that mean when it looks at what it thinks against a dictionary it may not notice any words at all)

    Not tried it myself though.

      1. hehehe, no thanks, anonymous keyboards are going to be horrible to use, they didn’t even have enough pride in the work to put their name to it!

        And there are only so many cornets you can take at a time, a whole office full of folks with automatic ones would be hideous….

    1. This reminds me of another project I heard of a few years ago. If I recall correctly, that one used a phone’s tactile sensors to record vibrations created by tapping on a keyboard. The idea was the same in the sense that you could supposed just put the phone down on a counter near a keyboard and reconstruct what was typed.

  2. Speaking of language and not layout is dumb, IMHO. The keyboard layout is what is important, the language is just a “fix” to correct the incorrect assumption of the layout guesser.
    I wonder if he’s using a stereo input to estimate the location of the source origin (it might be possible), since on most laptop, the distance to the key and its relative position is completely static.

    1. I would argue language is the thing that is key, not layout. This analysis relies on identifying a unique sound for each individual key and then determining how often each key is pressed with the end goal of solving the “hidden message” through frequency analysis. The method is not spatial by nature so where the key is located in your keyboard layout has no effect and, all other things being equal, pushing the “E” key on a QWERTY layout would sound identical to pushing the same key on a Dvorak keyboard. Since at its core all you are doing is counting how many times a specific key (tied to an unknown letter) is pressed the general problem is the same as any basic cipher which subs a letter with another letter and therefore relies on language not layout.

    2. Try reading the article before commenting.

      Layout is irrelevant, since the electrics and electronics are entirely disregarded.

      This technique records the sounds of each keys being pressed and tries to match similar sounds to particular switches, and then the switches are assumed to be distinct letters of the english alphabet. From that they can attempt to decode it.

      You seem to think that there’s a particular sound signature that would be shared by “W” in a QWERTY layout, but that’s ridiculous. Different keyboards can have wildly different constructions. Any assumptions made on that regard will immediately make the software useless, since those assumptions would never hold true in real life.

    3. Stereo or microphone array recordings (to leverage the fixed key layout through via some implicit sound Direction Of Arrival estimation) is a nice and interesting idea!

      However saying that knowing (and leveraging) the language used to type the text is unneccessary, is completely wrong!

      Given the audio recording, the most likely typed-text is estimated as the sequence of keys that jointly maximizes the sound-to-key probability (acoustic model probability Pam) and the language model probability (i.e. the probability Plm of observing a sequence of letters, which constitute words, which form reasonable sentences, etc.) As you can see the language model constraints the search among every possible key combination and make sure that some reasonable text is returned (note that if the audio or the acoustic model is bad, the system will just invent some random text … e.g. try feeding just white noise to Alexa /Siri/Cortana/automated-call-center/etc.). Of course it is possible to train multilanguage models or to jointly train an end-to-end model that incorporates both Pam and Plm in a single DNN. Whatever implementation you choose, the language prior information is a strong prerequisite.

      That’s also why recognizing typed passwords is way less accurate (they usually do not fully conform to English or any other language: the language model will be just the letter frequencies estimated on large collections of passwords) and needs adaptation data…

        1. Instead of a conventional mic array used in far-field condition (all available mics a few inches apart), please think about 4-8 omni mics scattered across a small room that are recording at high sampling rates while synced at sample level. Sub inch resolutions are certainly possible. Moreover this sound source localization capability is not meant to replace the single channel acoustic model but rather to complement it.

          At the end probably you’re right and the improvements may be marginal (and for sure not cost effective).

    4. No, it’s language. If you’re listening to the key sounds, you don’t care where on the keyboard the key is located (layout), you care what letters are most common in that language. Layout is completely irrelevant. This approach would work even on a chorded keyboard, though the sounds might not be as reliable.

      1. I don’t know, I think maybe the common qwerty layout might make it a bit easier. After all it was supposedly designed to separate common letters a bit so mechanical typewriters jammed less… hence the tonality of each influenced by where in the case and how much pcb is between it and edges and mountings is less similar between common letters than it would be if they were all grouped in the middle for the “quick and strong” two fingers.

  3. I have never understood why people are so up in arms about targeted ads. Targeted ads don’t bother me, because I don’t buy stuff based on advertisements. Advertisers fund the web, so I say “advertise away.”

    1. There’s certain lines of products in the personal grooming realm that you can’t buy online as a surprise for your partner any more, because if you do, every web browser in the house is gonna be plagued by ads for the thing you ALREADY BOUGHT for the next 2 weeks… it’s like if you wanna surprise gift anyone any more you gotta go to a small physical store, or only go to a big chain with all your radio off on your cell, because they can tell what aisle you’re lingering in.

      1. “be plagued by ads for the thing you ALREADY BOUGHT for the next 2 weeks…”

        Yeah, I just loved getting Subaru ads just after we bought one!
        B^)

        1. A lot of people either love it or hate it, two weeks after they bought it.

          So, drowning you in ads to “remind” you of “how wonderful” your purchase was is actually probably very deliberate.

          1. Unlikely.

            Cars aren’t a great example though. I’ve been getting ads for the last 2 months for windshields, because I replaced a windshield 2 months ago.
            Two months.
            Never ascribe to malicious what can be attributed to incompetent.

            However, there is something called a ‘chump list’.
            Have you ever been a Facebook user? Bought a Subaru? That will NEVER wash off.

      1. Paywalls…oh wait.
        Well there’s always cryptocurrency…oh wait again.
        No, there’s no acceptable to the masses way to fund things.
        Currently ads, or giving them free things, and data-mining them behind their backs.

    1. Funny, that’s how I got my last one. :-D

      But seriously…

      ” How else to explain the sudden appearance of ads related to something we’ve only ever spoken about, seemingly in private but always in range of a phone or smart speaker? “.

      Never had that happen and my phone usually isn’t that far away from me.

      1. I think that was at least partly in jest. Feels like it would be easy to test – just walk around your house saying “speedboat speedboat speedboat” all the time and see if any ads from West Marine show up (Unless of course you own a boat)

      2. Data mining goes further than just what you say. It also bases targeting on who you know and their search habits and relationship to you, so if you speak to someone about an item and they search for it you may see a targeted ad.

        I know this sounds outlandish but I’ve witnessed it multiple times from someone searching for a birthday gift to me I spoke about but never searched for.

      3. ” How else to explain the sudden appearance of ads related to something we’ve only ever spoken about, seemingly in private but always in range of a phone or smart speaker? “.

        Usually because humans are often more predictable than we like to believe and statistics only need to work occasionally. Whatever set of circumstances lead you to start thinking about (and later talking about) X in the first place, if that chain of connections had an online component that could be tracked, the eventual endpoint of your thought process could be inferred from the actions of others before you following the same cognitive chain.
        For example: if people browsing a Article Y followed by searching for Search Term Z had a greater than random chance of later purchasing Widget Q, then people who trigger the first two criteria will end up being shown ads for Widget Q. That the article piqued your interest enough to search for whatever the term was, then talk with your friend about it who mentioned said widget, is completely immaterial to the process.
        The kicker is that hundreds of thousands of times this blind algorithm misses in its targeting are ignored, and the small handful of times it is effective stand out as prescient, despite the overall level of prescience being astonishingly poor. Like computers in general, ad targeting algorithms are very VERY stupid, very very fast, but modern computers are extremely fast and massively parallel which makes up for that stupidity just enough to pay for their own power usage.

      4. I have seen this happen, it’s not just paranoia. A friend of mine was getting weird ads related to an odd subject he was talking about with his wife, which just seemed too unlikely, even as confirmation bias. At first he thought it was his phone, so we tried feeding it some bogus info, talking and texting about tennis, which neither of us play, to try and see if we’d get ads. It didn’t seem to do much, at least in the short term, But later on he saw some weird traffic on his network, and figured out that it was his Samsung smart TV, which contains a hidden microphone, not advertised as a feature, or even mentioned as existing in the manual.

  4. Targeted advertising? What’s that?

    Oh wait, you mean not everyone uses Ad Block, Privacy Badger, Disconnect, and website fences? Not to mention turning off all possible advertising tracking and sharing options with online sites.

    And there are people who let others peer into their lives by installing devices with microphones and enabling their phones to constantly listen to what they are doing?

    Anyhow, I get effectively no targeted advertising. It took some time to set up and a bit of vigilance to maintain, but it’s far more pleasant to use computers and the internet this way.

    As to catching passwords from keyboard sounds, it just goes to (yet again) show the importance of 2FA.

    1. Actually whilst I’ve turned off targeted ads, I’m borderline thinking of turning them back on. I’d sooner get ads for stuff I might buy than the gambling and dating ads I get as an untargetted user.

      1. Run a virtual rooted android and have it continuously report your location as snobshops in neighborhoods like Rodeo drive, Manhattan, Paris, Melan, Tokyo etc.

        Feed it looped audio recorded in insanely overpriced stores for fashion fools.

        Watch the rich f&$*er freebees roll in.

  5. Hah I dimly remember a gag in the show Due South (I think?) where the mountie catches a spelling mistake the american cop makes purely by listening to him type.

  6. Just yesterday I was messaging with my girlfriend about fiber optic testers, and today amazon tries to sell them to me because I’ve ‘searched for network and cable testers’…

    Yeah. I did. But it was over a year ago that I bought a new one to replace my best up one.

  7. I’m sorry. Did you say wireless keyboard without batteries, a circuit board or any internal electronics whatsoever? That sounds amazing. High latency though, but fine for people you only use it for typing.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.