Hackaday Prize Entry: The Open Voice Factory

Joe’s little brother Richard has never been able to speak. When Richard turned 19, he received a device not unlike the voice box of Stephen Hawking. Suddenly, Richard was able to communicate using thousands of words, and everyone could understand him. In the UK, there are thousands of people who could benefit from this technology, but can’t afford one. This is the inspiration for the Open Voice Factory, a device that allows anyone to create pages of touch screen interfaces and parses them into functioning speech aids.

The basic idea behind the Open Voice Factory is — wait for it — PowerPoint. Hold on, this actually makes sense. The Open Voice Factory is designed so caregivers can create and modify the touchscreen ‘pages’ with different words and actions. PowerPoint is universal, and everybody’s grandmother knows how to use it. In this regard, the software that is the leading cause of death for astronauts isn’t a terrible choice.

That PowerPoint stack is sent off to an online Factory that parses the commands and assembles a web page built for touch screen interaction. It’s brilliantly simple, relies on a cloud service so it’s highly marketable, and requires only a minimal hardware investment for each user. Consider the fact that computers – especially Macs – have had exceptional text to speech capabilities for twenty years now, and you wonder why something like this hasn’t come along sooner. It’s an awesome idea, and a great entry for the Hackaday Prize.

22 thoughts on “Hackaday Prize Entry: The Open Voice Factory

  1. Both Android and iOS have TTS engines, so why not make software that creates all the buttons directly on the smartphone/tablet? It’s simple: just add new button, select icon, type text, and whenever button is pressed, TTS service is called. Using PowerPoint and then some external server is rather bizarre way to do custom UI, and it will make software useless, when server ever goes down. And, as Internet of Broken Things shows us, it will eventually go down…

    1. Yea I know it seems bizarre but if you think about it it makes sense, most people who would use this would be working with some institution, these institutions have people who are probably not overly computer literate, but they would know MS office, including PP (hee-hee), and how to email. This leverages the common knowledge of people working with the speechless to make the system as easy to implement as possible in the places where it will be used.

      1. Adding button for this thing on computer or smartphone would be as easy as adding a new contact to phone book, or adding an appointment to calendar. One would need only to select an icon or text name for button and then type text to be said by the app. TTS engine would take that text string and speak it. Using PowerPoint and external service is actually more complicated both for user and for programmer who implemented this stupid solution. There is reason for things like Microsoft’s SAPI. Whoever developed this software is either a moron or a bastard looking for idiots who will fall victim to his cloud scam. On the side note: PowerPoint isn’t free. Or easy to use…

        1. The best example of this approach is: http://www.assistiveware.com/product/proloquo2go, which is an extremely solid offering and is market leader for a reason. They also have (I’ll have to check) a good half dozen programmers working on it full time – it turns out that this is one of those things that you think is simple until you work with the end-users.

          When we run focus groups for people who maintain devices (and it’s worth being aware of the difference between the people who maintain the communication aid and those that use it) – SLTs (SLPs in the US, which makes it harder to pronounce), parents, care staff and so on, they very much prefer this approach.

  2. This seems like one of the applications that tablets were actually designed for.
    And in fact I found several apps that do exactly this.
    So what does this solution do fundamentally different, besides being dependent on costly proprietary software?

      1. I don’t know if it makes it any easier. I mean the article says everyone’s grandmother knows how to use powerpoint, but to tell you the truth, that’s inaccurate, I guarantee you my mother couldn’t use it if I gave it to her. I think it’d be a lot easier to give them an interface to create pages and add buttons to pages, and give them a means to parse text as sound so they don’t have to upload sounds. That sounds a lot easier to me than trying to learn powerpoint…

        1. And every common OS has TTS service build in, which I mentioned in my first post above. There is no good reason to this kind of software in that convoluted way. The only good thing about cloud in this case is ability to collect common buttons/pages made by users to create universal starting set. But you don’t need to use cloud service for processing, you just ask users to send their setups anonymously…

  3. Hypercard.

    But seriously, any system that is designed with an artificial middle-man to get you what you need should be viewed with great suspicion. Why do I need to send my PPT to a cloud service when the PC I am using to make the PPT is good enough to do the post-processing?

    And I hope it’s compatible with Libre Office.

      1. Well… there’s a serious point here – there is something to be said for a communication aid that knows you said “Hi Mary” yesterday, and offers you the option to say “Yesterday I saw Mary” today (We mostly work with preliterate users so choosing appropriate icons for automatic utterances is a real issue). For more of my thoughts on this and it’s ethical consequences- you might be interested in http://joereddington.com/csr/papers/slpat.pdf (I don’t always reply to comments, but when I do…)

    1. Oh, the Python script can be run locally – that’s no problem. The only reason there is a cloud version is because the people we are aiming at aren’t people who go to the command line.

      As a point of interest, a few days ago I tested the Factory with templates edited in Keynote, LibreOffice, and Google Drive (and all exported out to pptx), the Factory didn’t like, any of them, mostly because they are do something strange to link formats when import/exporting. More work will happen there – but we’re focusing on PP because that’s what comes back when we do user-testing and focus groups and such.

    2. You can of course just run the Python script on your own system :) the only reason it has a cloud version is because our target people aren’t people who think of a command line as a Thing.

      As a point of interest, a few days ago I tested the Factory with templates edited in Keynote, LibreOffice, and Google Drive (and all exported out to pptx), the Factory didn’t like, any of them, mostly because they are do something strange to link formats when import/exporting. I’d like to increase coverage, but I want to finesse the PP first – because that’s what comes back to us from user testing and focus groups and such…

      1. The fact that your software doesn’t run against ‘compatible’ pptx files should tell you something. Not least that it’s a dumb idea.

        Imagine what would happen if Microsoft were to change their software a little. Suddenly your software stops working. Being beholden to Microsoft is a stupid move, and precisely what Open software saves you from. If you had focused on Libre Office (for example) from the start you might be in better shape now.

        Instead, make your own simple GUI, and simple (open) data format. It doesn’t have to be particularly good. It just has to work.

  4. Friends of mine have a special needs child that purchased an apple app for $100 dollars that does picture to sentence building, can do tts and is easily customizable. Talking to them they said there is already many free apps for text to speech, not sure how customizable they are…
    regardless, there is a huge need for this, maybe he can partner up with the tounge mouse people for an all in one solution

Leave a Reply to AlfiesauceCancel reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.