Multitasker Or Many Monotaskers?

In Al Williams’s marvelous rant he points out a number of the problems with speaking to computers. Obvious problems with voice control include things like multiple people talking over each other, discerning commands from background conversations, and so on. Somehow, unlike on the bridge in Star Trek, where the computer seems to understand everyone just fine, Al sometimes can’t even get the darn thing to play his going-to-sleep playlist, which should be well within the device’s capabilities.

In the comments, [rclark] suggests making a single button that plays his playlist, no voice interaction required, and we have to admit that it’s a great solution to this one particular problem. Heck, the “bedtime button” would make fun project in and of itself, and it’s such a limited scope that it could probably only be an weekend’s work for anyone who has touched the internals of their home automation system, like Al certainly has. We love the simplicity of the idea.

But it ignores the biggest potential benefit of a voice control system: that it’s a one-size-fits-all solution for everything. Imagine how many other use cases Al would need to make a single button device for, and how many coin cell batteries he’d be signing himself up to change out over the course of the year. The trade-off is that the general purpose solution tends not to be as robust as a single-tasker like the button, but also that it can potentially simplify the overall system.

I suffer this in my own home. It’s much more a loosely-coupled web of individual hacks than an overall system, and that has pros and cons. Each individual part is easier to maintain and hack on, but the overall system is less coordinated than it could be. If we change the WiFi password on the home automation router, for instance, I’m going to have to individually log into about eight ESP8266s and change their credentials. Yuck!

It’s probably a matter of preference, but I’ll still take the loose, MQTT-based system that I’ve got now over an all-in-one. Like [rclark], I value individual device simplicity and reliability above the overall system’s simplicity, but because our stereo isn’t even hooked up to the network, I can’t play myself to sleep like Al can. Or at least like he can when the voice recognition is working.

21 thoughts on “Multitasker Or Many Monotaskers?

  1. I have two networks in my house. This requires desktops to have two network interfaces (hardwired). One for Internet and one for internal. That way all devices in the house on the internal network never touch or touched by the internet. I don’t plan on changing the internal ‘wifi’ password anytime soon as you pointed out it would be a pain :) . Instead of MQTT, I use a redis server located on a RPI ( Pidp-11, pi-hole, and other services as well) which is on a UPS as the data/comm medium. That way the devices only have to know the server IP to put/get data. Simplify… Ie. As example, status of switch sent to redis server. Relay micro controller reads status from redis server and turns light on/off. Also the switch status is then available to any other application as well (maybe a status LED security panel in anther room or… whatever). I developed a protocol on top of the basic redis protocol to standardize how each device gets/puts data into redis. Therefore, any device can scan the redis server to see what devices are out there and what input/output is available and format to read/write the data. But I digress…

    Button or a button board interfacing to a micro controller or SBC could be designed (much like the keyboards we see periodically here) that send a message to the redis server for consumption by some device looking for it to turn on music or sound a horn, or … . Wouldn’t have to be battery operated either, just plugged into a wall socket or a power strip…. Agree though, every device needs power and battery change out is a pain. Be so much nicer to spread devices around without having to worry about where the dang power is going to come from :rolleyes: . I guess that is why we run wires to a central device for windows, doors, etc. Ha!

    1. You don’t need two nics on every end user machine, just a stateful firewall that allows traffic going from the ‘Internet access allowed’ subnet to the ‘no internet’ subnet, and denies traffic going from the ‘no internet’ subnet to anywhere else.

      It sounds like you may be concerned about some devices on your ‘internet allowed’ network communicating with your iot endpoints – in that case use firewall rules to only allow traffic into the iot subnet for a specific set of IPs (or better yet just put the end user devices on their own subnet).

      Even very low end SOHO waps / routers can take use a trunk port and put different SSIDs on their own vlan/subnet

      pfSense (and OPNSense, which I don’t have personal experience with) are great open source projects that make it dead simple to deploy a segmented network like that. They also have packages for MDNS reflection, pi-hole style blocking, VPNs etc.

  2. Here’s my bedroom button.

    hackaday.io/project/5283-potpourri/log/238642-bedroom-button

    I wanted a lightswitch that could be turned on easily, in the dark at night when you’re groggy. Without fumbling around for a chain or column switch.

    The box was purchased at HobbyLobby, the switches are big arcade buttons.

    The steel blocks either side of the mechanism give the box a sense of weight and prevent it from sliding around on the tabletop, and the lithium battery will last decades.

    Sometimes simpler is better.

    Note that you can purchase bluetooth camera remote shutter switches on eBay for a buck, and bluetooth relays as well.

    I’ve got one RasPi running some lights, but that’s all I really need. With, like, two exceptions I’ve never seen the need for house automation. Why anyone would need to start their washing machine while on vacation in Singapore is beyond me.

    (Two exceptions: Ring doorbell, and something that tells you when the washing machine in the basement is done. Other than that, I don’t see the use case.)

    One problem with current home automation systems is that they require a phone app and a manufacturer’s server. This is because there’s no good way to integrate a purchased device with your current Wifi network. Supposing you purchased a home automation device… how do you put it on your local network? The process is too complicated for the end user, who typically doesn’t know how to configure a network.

    Shameless plug: For those of you trying to make a product, here’s a project that can integrate into your system that can simplify the process. It puts up an access point with the name of the product, allows the user to select their home network, and then switches to that network on reboot.

    https://hackaday.io/project/175543-easy-raspi-configuration

    (My ooler, which is a cooled mattress pad, requires an internet connection and an account on the manufacturer’s server to operate, and there’s no way around that. It also requires location information from my phone. A mattress pad. I should say “ex-ooler”, because I eventually threw it out.)

    1. “This is because there’s no good way to integrate a purchased device with your current Wifi network.”
      This is why I ended up preferring IoT devices that require a hub instead of requiring a cloud connection. Plug in the hub; connect all devices to the hub. WiFi changes? Doesn’t matter, the devices aren’t using the wifi, they are connected directly to the hub, which is physically plugged into the modem/router.

  3. I would accept my assistant as it is in terms of capability save for ONE FEATURE.

    It knows it’s 3am and my house is so quiet you can hear a pin drop, I whisper “Okay Google, turn off the lights”.

    All I want is it doesn’t scream “OKAY…. TURNING THE BEDROOM LIGHTS OFF”…. “BEEEEBOOOP”.

    The solution isn’t even hard, let me calibrate. I’ll stand in the corner of my room and find the minimum volume needed. I’ll then stand next to it and it will find the maximum volume needed. Xbox One’s Kinect had a calibrate option, it would make pings at various volumes and listen to echos.

    Hell let me set a volume for waking hours, and a volume for night. The assistant could even listen to ambient sound levels.

    So so so many solutions, so little interest to fix the damn problem.

    1. I just want every bluetooth speaker and bluetooth enabled voice device to NOT blast out the bluetooth chimes at full volume when you turn them on or switch sources.

      Like, it remembers the volume setting from when previously used it – why does it then ignore the setting when I press the connect button?

  4. Sorry, but a voice control system is most definitively not a one-size-fits-all solution. Think of a piano for example.
    Problem is, the more we talk about it as such a solution, the higher the risk of someone or some company believing it and inflicting pianos with voice control instead of keys on us.

    Solution for the bedtime music would be the sleep timer and a program key on the stereo. No need for network.

  5. This reminds me of a thought I had recently about getting open source voice assistants a step up.

    I was thinking of combining voice recognition with an LLM with a prompt having the LLM choose the correct action to take from a list it has permissions for.

    It would probably be effective for converting more vague or organic voice commands into structured ones that could be handled automatically.

    New features would be easy to add by adjusting the options the LLM is made aware of.

    The main issue I’ve seen with current offering is the need for rigid structured commands and this could help with that.

    Also since it’s not all in one speech to text models could be swapped separately from the LLM and separately from any TTS models.

    Add in the option for the LLM to ask follow-up questions and it can help with getting quality results.

  6. There are many people who are unable to speak and rely on Augmentative and Alternative Communication (AAC) technology to create speech. This reduces the ease of access to a voice control system.

  7. “But it ignores the biggest potential benefit of a voice control system: that it’s a one-size-fits-all solution for everything.”

    I’m going to be generous and read that as “it’s possible to use voice control as a single interface that replaces many buttons to trigger jobs that have already been reduced to ‘do this’ state.” If that wasn’t your intent I want to see the part of the internet where people say, “gosh, I just love navigating phone menus”.

    Even in the limited form, it’s still bad design thinking.

    Remember back in 2017-2018 when Amazon released a commercial where someone said, “Alexa, buy catfood” and people’s Echo devices did? Or the news presenter who triggered a wave of dollhouse orders?

    https://news.sky.com/story/amazon-echo-orders-dollhouses-after-hearing-tv-presenter-talking-10722985

    Getting an interface to do what you want when you want it is only part of interface design. Avoiding false positives is also a part of it.

    Show us a voice controlled system that’s easier to use than pushbuttons, is just as good at avoiding false positives and false negatives, and doesn’t require silly amounts of tweaking before it’s useful.

    1. But that’s the trick, right?

      The voice recog system can do an infinity of things, and do it right most of the time. The button does the one thing, and it does it right absolutely every time.

      And if your whole system is either that one function, or the infinity of buttons, the choice is clear. Real life lies somewhere in between, and it’s worth thinking about the tradeoffs.

      1. The choice is clear for everything that has not at least a one or two minute time frame to be done … well, at least it should. I personally know a high risk implementation, where speech output may block other speech output until it is too late, grinding the system to an emergency stop. Earlier systems just used beeps, which were much more (time-)precise in output and recognition, and more than one beep could be played, recognized and reacted on at the same time. The speech does not contain more information than the beep did.
        This is why I decidedly stand up against the one-size-fits-all narrative.

    2. ‘A keyboard, how quaint.’
      Was the single stupidest thing ever in Drek world, that’s saying something.

      Can you talk as fast/clearly as you can type?
      If you can, learn to touch type.

      Consider saying ‘OK’ 6+ times vs, just banging the enter key to accept defaults.
      Imagine working in a cubefarm with everybody using voice control…GD nightmare.

      ‘Querty, how quaint’ might have worked.

  8. For a few years now I’ve had a raspberry Pi running a MQTT server on our home network, and a box of ESP boards and Tasmota ready devices… and I’ve done just about nothing with them, basically…. I just can’t come up with reasons and scenarios where our home or life would be measurably better by going to the effort to install and program a bunch of things. Yes, I know this seriously damages my claim to be a real hacker.

    I will not have any of the commercial voice recognition thingies in our home. I’ve observed them in action at friend’s homes, and they are somewhere between a novelty and a nuisance. The owners end up shouting and repeating themselves, like they’ve hired a deaf aunt as a servant. I’m also reluctant to gift external entities with an open mic and free data.

  9. The voice control system on the Macintosh in the 90s was pretty much perfect. You put a script called “play my going to sleep playlist” in a folder, then you said “Computer, play my going to sleep playlist”, and it executed the script. If you didn’t say “computer” first, or if there was no script matching the command you gave, nothing happened.

    That’s all voice control ever needed to be, and every system since then has been stupider by trying to be smarter.

  10. The solution is…there is no solution.
    Any attempt to find a broad solution for every problem is a folly, and being unable to see or admit it is a flaw that needs to be overcome.

    If a solution seems like it is a great way to do a bunch of things well, it means you are just not seeing the critical flaws.

    It’s kinda like how touch screens are awful input devices for most tasks, but designers keep slapping them in cars anyway, despite one of the primary requirements of any control sheme in a vehicle being usable without looking at it or requiring the attention to ‘find’ the right input.

    Each scheme has its place.
    The more general use it is, the worse it will be. Or he more compromises it will have to make.
    On the other hand, the more specialized it is, the more often that task Mut be required to justify it.

    A bedside button for a task you do every night is reasonable.
    As is a light switch next to the room entrance that you will need to walk by anyway.

    Foot switches are useful when you need to do a thing often, while also using your hands.

    Voice control?
    Well, as a rule, people are awful at communicating.
    For voice control to be useful and reliable, you need a situation where you cannot or should not touch a control AND the system needs to have a syntax.

    All this “natural language” voice control is the reason these systems suck.

    Command syntax and key words have worked for voice control systems for a LONG time. And the VERY scifi sources that inspire our image of control systems often portray it’s use with a command syntax anyway.

    Tea.
    Earl Grey.
    Hot.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.