Walkie-talkies are great fun, and [RealCorebb]’s bbTalkie project takes the concept a step further by adding some extremely cool features to make a highly refined, self-contained ESP32-based communicator. bbTalkie completely does away with a push-to-talk button by implementing robust voice detection that works reliably even in noisy environments. It was all designed with cycling in mind, so hands-free operation that stands up to noise is a big plus.

The core of communication is done over ESP-NOW, which is Espressif’s own protocol for direct device-to-device broadcasting. This removes the need to involve any sort of external service like SIM cards or internet access to transmit voice. Performance is best with an external antenna, naturally, but ESP-NOW doesn’t actually require anything other than the existing on-board hardware.
We’ve seen ESP-NOW used to make digital walkie-talkies before, but bbTalkie is a really evolved take on the concept, not least of which due to its hands-free operation.
Because volume-based automatic triggers are highly susceptible to noise, voice detection is done with the help of VADNet, a neural network-based model implemented locally on the device. This system can reliably detect human speech, even in noisy environments. This lets bbTalkie switch between transmit and listen modes automatically and hands-free, without false triggers.
Even when doing all that, there’s still spare capability to play with. Further to the goal of making bbTalkie useful for cyclists in a group, [RealCorebb] added a system that can recognize specific voice commands (like “turn left” for example, or “wait for me!”) which trigger synchronized animations to play on the displays of all connected units. There’s even some experimental support for controlling a camera over Bluetooth, though currently it only supports hardware from Sony.
Watch a tour of it in the video below (Chinese language, English captions available). The OLED screens and animations are adorable, and are great visual feedback of what the unit is doing at any given moment.

I live down a rural country lane popular with cyclists, and they have to shout to each other to be heard. If I’m in the garden I overhear whether I like it or not. The psychology of conversations is interesting (generalising):
Man to man: talk about their equipment – bikes, cars etc
Woman to woman: talk about their children, school etc
M/F couples: talk about what to have for dinner tonight etc
Appologies for missing the point of the article, sounds seriously useful.
plausibly the case before homo sapiens
i would never want this product but its capabilities make me feel like we’re finally appraching ‘comm badge’ kind of functionality, which is pretty exciting. i know there were a few early products that weren’t quite usable but i haven’t seen anything in that space in a few years
I can see a version which contains a local “gateway” device in Engineering which transmits to another remote gateway device on the Bridge which then communicates with bbTalkie units in the vicinity. A single tap only talks to local nodes, a double tap communicates to local and gateway nodes.
Wow, this is exactly what I have been wanting for years but never bothered to make.
Me too! This is by far the most plausible realization of the wifi walkie talkie that we’ve seen. Plus, a whole bunch of other crazy features that you don’t even need!
The on-chip noise rejection and etc is fantastic. And it serves as a reminder that simple voice commands don’t need to be hard.
My only question is what is lost before the speech is detected. Does it keep a buffer and play the buffer after the detection of speech, if it is a longer snippet of speech does it ‘chop’ blank spaces down until you are hearing the lowest latency signal? Is is possible to integrate with an FRS for real “walkie-talkie” action out of range ESP32 can muster?
Excellent idea, implementation is the difficult part, if you make it hands-free the implication is you equip non-technically minded with it.
ESP docs (linked from the article) show 128ms buffer as an example. No idea about buffer depth of the device specifically.
I’m hearing impaired and this gives me the wonder: Could I use something like this I toss to a friend so I could hear them further away than normal? Neat idea, certainly worth looking into!
You might want to talk to Your hearing aids/cochlear implant provider about FM/DM system as this is exactly what You’re looking for
Some good ideas but trying to hear parakeet speech out of that tiny tiny thing on a bike in any kind of noise like the street seems useless, maybe in a headset. Any kind of visual display on a bike is just as bad as behind 4 wheels. As for cuteness, when I saw the very first “smiley face” in the mid 60’s insurance mailer I thought “that’s ugly”.
It seems like a bluetooth earpiece is the missing piece of the puzzle here. I’d think that should be easy enough to add given the ESP32 already supports it, it should just be a code change.
yes please, exactly!
Most esp32’s (like the esp32-C3 and the S-3 used in the article) don’t handle BT audio (A2DP), and the ones that do (basic esp32) don’t have the 8MB Octal PSRAM required for the ESP-SR speech-recognition used, so you’d need to add a bluetooth classic module, or keep it simple and just turn the whole thing into a headset instead of a walkie. (Could either use a head-mounted display for the visuals, or use one on the handlebars with display and one in a headset without a display.
I’d love to use something like this combined with a bone conduction bluetooth headset for rock climbing.
With all the noise in the climbing hall it’s sometimes hard to hear the climber giving commands to the person belaying.
I would like a walky talky from car to car or would that become a shouting match
I really really need to get learning how to make these devices. That whole esp32 proprietary protocol for communication has been fascinating me for a while now. So cool to see what they future is finishing l bringing us. Well. Partly. That whole hard right slide the entire world is doing is definitely a damper on LITERALLY everything.