Live Subtitles For Your Life

Personal head-up displays are a technology whose time ought by now to have come, but which notwithstanding attempts such as the Google Glass, have steadfastly refused to catch on. There’s an intriguing possibility in [Basel Saleh]’s CaptionIt project though, a head-up display that provides captions for everyday situations.

The hardware is a tiny I²C OLED screen with a reflector and a 3D-printed mount attached to a pair of glasses, and it’s claimed that it will work with almost any ARM v7 SBC, including more recent Raspberry Pi boards. It uses the vosc speech recognition toolkit to read audio from a USP audio device, with the resulting text being displayed on the screen.

The device is shown in action in the video below the break, and without trying it ourselves we can’t comment on its utility, but aside from the novelty we can see it could have a significant impact as an accessibility aid. But it’s as an electronic Babel fish coupled with translation software that we’d like to see it develop, so that inadvertent but hilarious international misunderstandings can be shared by all.

Regular readers will know that we’ve brought you plenty of HUD tomfoolery in the past.

https://www.youtube.com/watch?v=DIHrooFKeLU

16 thoughts on “Live Subtitles For Your Life

    1. My father is hard of hearing and hearing aids just do not work that well for him. This would be wonderful for him. I can also see uses for it attached to a motorcycle helmet. It could give you the speed and RPM and fuel warnings All without looking down at the gauges. When dealing with technology it is always best to look past your own wants and needs to other’s wants and needs.

      1. I would like it too as a helmet mounted rear view camera display..
        Though the thought of having a jagged piece of acrylic so close to my eye…

  1. I think this is a really cool proof of concept that could have immediate uses for the hard-of-hearing.

    Related, it gives me an idea to use the same basic physical design but with the google translate api and another screen worn on the chest.

    Say I’m speaking to someone who only speaks mandarin: I could have my english text translated to simple Chinese on my chest display, and have my heads up display translate to english. In theory it could as natural as watching a foreign film with subtitles.

      1. There seems to be no lens or other optics, so the virtual screen is at the same distance as the physical screen. I’m skeptical that this is usable at all, and I find it suspicious that the video never actually shows it in operation.

      2. He may be myopic, but the lens of teh eyeglasses is between his eye and the screen. As a myopic person, I can tell you that the lens makes it harder to focus on close things.

  2. I wonder if there’s an tool with an algorithm to make the image blurry, so when the eye looks through it normally it’s effectively in focus. A bit like a stereogram but for each pixel.

    1. If only it were that easy. Making an image blurry and putting it outside the plane of focus only makes it blurry-er. An image is blurry to the camera or eye because the light rays coming from each point of the object isn’t refracted to the correct angle to converge back into points on the film/sensor/retina but are instead spread out making fuzzy spots. To make an object at 1 cm appear in focus when the plane of focus is at 10 cm, you’ll need to make the light coming from the object appear to have the same angles as the object at 10 cm. Making it blurry doesn’t do that but refracting the light with a lens can.

  3. Throughout history humans have resisted change and it has taken time for new technologies to catch on, it is a disappointment that we haven’t overcome this by now.

    I can’t wait for the day that we have cheap, consumer grade lightweight AR glasses with enough computing to surround ourselves with the data that’s important to our lives.

    I’d love to be able to make a proper AR RPG for people to play outside but alas, the hardware just isn’t there yet/is expensive. Especially multiple/infinite focal plane.

  4. Not sure about this project, but thank you for bringing the vosk speech recognition library to my attention – it looks perfect for a project I was using pocketsphinx for (and getting poor results)!

Leave a Reply to DavidCancel reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.