Speech Synthesis On A 10 Cent Microcontroller

Speech synthesis has been around since roughly the middle of the 20th century. Once upon a time, it took remarkably advanced hardware just to even choke out a few words. But as [atomic14] shows with this project, these days it only takes some open source software and 10-cent microcontroller

The speech synth is implemented on a CH32V003 microcontroller, known for its remarkably low unit cost when ordered in quantity. It’s a speedy little RISC-V chip running at 48 MHz, albeit with the limitation of just 16 KB of Flash and 2 KB of SRAM on board.

The microcontroller is hooked up to a speaker via a simple single-transistor circuit, which allows for audio output. [atomic14] first demonstrates this by having the chip play back six seconds of low quality audio with some nifty space-saving techniques to squeeze it into the limited flash available. Then, [atomic14] shows how he implemented the Talkie library on the chip, which is a softwarehttps://www.youtube.com/watch?v=RZvX95aXSdM implementation of Texas Instruments’ LPC speech synthesis architecture—which you probably know from the famous Speak & Spell toys. It’s got a ton of built in vocabulary out of the box, and you can even encode your own words with some freely available tools.

We’ve seen [atomic14] tinker with these chips before, too.

34 thoughts on “Speech Synthesis On A 10 Cent Microcontroller

    1. C64 did not cost 10 cents :-)

      And BTW S.A.M. reciter on C64 was pretty advanced and did not need that much memory, there was still lot of free space for BASIC programs with new SAY command.

      I remember writing number guessing game for it in BASIC as a kid and my parents did understand what it is saying and could play it :-) And BTW it was not in english, the built in phoneme syntax was good for most languages.

      1. Yes, sure. But come on, 48 MHz! 😃
        And merely ~2048 ASCII characters of random access memory!
        A blank text file alone occupies 512 Bytes on a FAT medium.

        That’s like combining an AMD Ryzen with 512 KB of Conventional Memory as RAM..
        All the processing power in the ALU and SIMDs but then
        there’s no RAM for sophisticated applications to put all the power to good use! 😢

        1. A blank text file alone occupies 512 Bytes on a FAT medium.

          It does not. That’s just the reserved allocation unit due to the file system. The file system is just a map of where the data is, not the data itself. The file system could pretend it has allocated 512 bytes for the file even if it could actually only store e.g. the first 128 bytes.

          1. Hi, thanks. I meant to say it occupies the smallest allocation unit, which is 512 Bytes.
            That an “blank” file contains not 512 Bytes of characters is understandable, I thought.

    2. Moore’s law reality. They made transistors inside the chip smaller to make it cheaper. But that also made it faster. And (spoiler) with more leakage, the deep sleep consumption of that chip isn’t that good.

      1. Yes, but 64 KB of RAM were affordable 40 years ago!
        Why can’t this thing have 32 KB of internal SRAM, at least?
        After 40 years of progress and in these times of 64 GB RAM DDRx modules and SD cards in the TB range.
        The way it is now, it barely beats an Atari 2600. 🥲

        1. Because most embedded applications running on a chip with only a few I/O lines simply do not need that much RAM. Adding unnecessary RAM would increase transistor count, chip size and power consumption.

          1. Hi, what I mean is the relation, the balance.
            The computational power is not nearly in balance with the RAM expansion.
            Here you have an advanced RISC-V architecture that barely has the memory capacity to play tic-tac-toe, yet alone chess.
            This is a waste of an otherwise precious creation of human intellect.
            I feel so sad for this little piece of silicon. It’s been restricted for no good reason.
            It’s like having a 10 HHz Pentium IV processor that has a couple of shift registers for memory and a paper punch card reader for mass storage. So sad.

          2. The computational power is needed for fast signal processing, which doesn’t necessarily need much RAM. Something like a FIR filter, which only needs a few hundred bytes for a buffer, but it has to run fast in order to process things like audio in real time.

          3. For example, if you want to push samples through at a rate of 96 kHz and your CPU is 48 MHz, you can only afford to spend 500 cycles on computing each sample.

            That sets the limit of how much RAM you can afford to access per each sample, because accessing memory takes clock cycles. You would be addressing less than 500 bytes of RAM because you have to do something with the data as well.

            For static data, that can go into flash memory.

      2. I mean, I see it this way:
        By adding a little bit more of RAM and making it 5 or 10 cent more expensive it becomes useful.
        Without it, silicon material is wasted for no good reason. And 10 cents are simply lost.
        I don’t get the profits thinking. If you do produce something, it should be functional.
        Just producing it to beat the competition is unlogical from a functional point of view.
        It’s better to save up money and buy something more expensive that functions properly (at very least) and doesn’t cause headaches.
        Because headaches and wasted human lifetime are more “expensive” than a bit of comparitively worthless money. IMHO. 🤷‍♂️

        1. Just buy a STM32C011 or CH32V006 or PY32F003 instead?

          This is pointless criticism, because there are plenty of applications for devices as small as this.

          There are even still applications for $0.03 8-bit MCUs. Every took one of those battery operated lights apart?

          1. Hi. It’s an opinion, not a criticism. And money isn’t my problem here.
            It’s not about me and not about me wanting something.
            It’s about restricting an precious piece of technology from a philosophical point of view.
            It could do so much more if it was reasonably being designed.
            My heart bleeds a little bit if I see such poor “creations”.
            That being said, the project itself is well done. I never meant to criticise that.

          2. It could do so much more

            Like what? MCUs like this with limited IO generally do not even need to handle that much data. Their main point is just to get whatever simple thing they do done quickly, so the device can go back to sleep to conserve batteries.

          3. For example, according to the data sheet the device consumes between 1-9 mA running depending on how many peripherals you have in use and what clock options and voltages you use. It goes down to 0.4 mA in sleep mode. With suitable power gating, that can go down to nanoamps.

            Now, if you wish to create a battery powered device, your realistic goal is in the low microamps, so you should use power gating timers to get there. Suppose your target is 10 microamps and we assume 7 mA at 48 MHz running: that means the device can only operate for 1.4 milliseconds per second. Your budget for computation is therefore about 68,500 clock cycles per second.

            If you wish the device to wake up and check some IO and do some action every 20 milliseconds for something like checking if a button is being pressed, your budget for computation per operation drops to just 1371 clock cycles. That includes booting up the MCU, checking the IO, doing some simple calculations, setting the outputs and then dropping power off. You really can’t do much in that time that would demand any amount of RAM because you don’t have enough time to process it. In the case outlined, even 2k would be totally overkill because you simply don’t have enough clock cycles to fill it with data and process it.

            Speed is the key. This is why these simple MCUs are “ridiculously” fast compared to the amount of memory they have. They’re intended for applications where the MCU is mostly doing nothing, so it can remain turned off.

        2. okay… so you have a problem with the fact that a 48MHz microcontroller has only 2k of RAM… you completely miss the point that it only costs 10 cents?!?! You completely miss the point that most applications do not require that much of RAM anyways!??! Yet you seem to be completely OK with the fact that it only has 8 pins of which two are reserved for power, so six IO-pins… why do you need more then 2K of RAM on a device with only 6 IO-pins?!? To be honest, I do not think the 2K is a restriction and for what it is and does it’s more than enough and in most cases it will be much more than needed. And for those rare occasions you do need more, shell out another 10 cents and get the bigger version.

          Regarding the project here, cool little project very inspirational, a nice explanatory video too.

          1. I think the implication is that because it’s 48 MHz it should be able to run DOOM or something. It’s totally missing the point of what these things are made for.

          2. okay… so you have a problem with the fact that a 48MHz microcontroller has only 2k of RAM… you completely miss the point that it only costs 10 cents?!?!

            Hi, no. I’m not missing it. I’m thinking that +/- 10 cents are less relevant, though.
            I’m not focused on products being cheap, but about them being usable.
            I’d like to respectfully point out that just because I have other priorities or point of views doesn’t automatically mean I’m dumb/mentally limited etc.

            See, let’s see it this way for once: You bind production capability of a factory to build artificially inferior chips that could been better spent,
            if the capacity was used to produce quite good chip for just a little bit more money (say 5-8 cents more, so the specs was changed so it can have 4, 6, 8 or 16 KB of SRAM).
            Because RAM for living application code and short time storage is the most limiting factor.

            “Fast but dumb” doesn’t make sense in control applications, I think.
            Even a coffee making machine has to remember different settings for different types of coffee.

            I “grew up” with the PIC16C84 and RAM often was the limiting factor, it caused headaches because programming had to be in
            ASM and had required clever workarounds.
            In the mid 90s, a RAM limit of 68 Bytes was equally silly like 2KB are now 35 years later.
            But back then it at least was understandable, given the technological limitations. The measuring unit still was micro metres rather than nano metres (micron, µm; that’s 1000 nm).

            If we consider the advancements in lithography, it would be a piece of cake to produce microcontrollers with a reasonable amount of RAM. For very little money.
            Because, higher integration levels (smaller sizes) do save the amount of raw material (molten sand) rather than requiring more.
            So if the expensive lithography machines are already here, already bought, going to smaller structures will increase the production.
            The yields are already good, the wafers are well utilized.
            Compared to say, 1995 when the PIC16F84 was new.

            why do you need more then 2K of RAM on a device with only 6 IO-pins?!?

            Good point. They could (should) be multiplexed if needed.

            I think the implication is that because it’s 48 MHz it should be able to run DOOM or something. It’s totally missing the point of what these things are made for.

            Basically, yes. Though I’d prefer Commander Keen. ;)
            What hurts a bit is seeing an advanced architecture like RISC-V to be abused for carrying out dumb tasks only.
            Sure, production is cheap. But from a point of human civilisation it’s still a shame.

            It’s like smashing a pretty mosaic window of a church with a baseball and a bat.
            Even if the the glass is just painted by a machine and produced en masse,
            and not built from individual pieces like in medival times, it still hurts seing this destruction happen.

            It’s not about the worth of the material, but about immaterial “loss”.
            The painted glass contains precious structures that had been created by human imagination.
            Just like the die structure of an Z80, 6502, PIC16C84 or ATtiny13.

            PS: I’m a bit baffled that no other person sees at least a little bit like this, to be honest.
            The world must have changed more than I had assumed.
            I remember times when people were astonished/amazed by technology and valued it.

            Reducing micro electronics to mere monetary units is something I can’t mentally comprehend.
            I mean, sure I do see how things are connected, but I have trouble understanding the priorities.

            If a project involves a small production run of a few dozen units, like it’s usual for hobby use,
            then spendind 50 cents more or less per microchip doesn’t financialy hurt anyone.

            And if it’s a big production, the cost can be given further to the customer.
            He/she doesn’t even notice if the product does cost $9.95 or $9.75 in the end.
            Because the seller/producer will rise the final price tag a lot anyway, to have sicjk profits.
            So production cost is covered, anyway.
            No matter if the chip used costs 10 cents, 25 cents or 100 cents.

        3. It is useful as it is, for simple controlling tasks with little I/O.

          Don’t confuse general purpose microprocessors with gigabytes addressing, against simple microcontrollers. Yes the humble controller today has more CPU power than your dad’s C64, but is not designed to address large RAMs. It is a perfect fit to control your washing machine, toaster or whatnot with its internal 2kB RAM. But you would not make a BASIC homecomputer with it.

          1. Hi, thanks. That’s right, of course.
            Still, it reads a bit as if I would require a history lesson just because I see things from a different angle here. vy73s.

          2. PS: A toaster, yed. A washing machine is not a simple thing, though.
            Us men not familiar with household appliances maybe think so, but the algorithms for different washing programs aren’t unsophisticated, either.
            I dare to claim that a washing machine from ~1985 had an amibitious microcontroller board for its time.
            A sophisticated microcowave oven from 1997 had an LCD display with animations and more than 2KB of RAM, likely.
            Example: https://m.youtube.com/watch?v=UiS27feX8o0

          3. It is useful as it is, for simple controlling tasks with little I/O.

            I wouldn’t mind if we would be talking about an 2 MHz chip with a very simple instructions set and a die with a few hundreds to thousand transistors.
            But even an 6502 CPU or MCS-48 microcontroller from the 1970s deserved better, I think.
            The 8i842 of late 70s already had 256 Bytes of RAM, which both was a limitation of its time and also a lot for its time (for on-chip-RAM).
            Not ideal, though. If the manufacturer had “today’s” technology (=technology of past 25 years) at the time the RAM could been 24, 32 or 64KB instead.

            It is a perfect fit to control your washing machine, toaster or whatnot with its internal 2kB RAM.

            If you program by hand in ASM, perhaps yes, I think.
            If you’re developing the application code using a full-fledged IDE and a high-level language like Basic, Pascal or Python, then it looks different.
            Those languages/compilers hold default values in memory for processing when the application code runs on the microcontroller.

            While I do appreciate ASM, it’s also a danger once complexity increases.
            Lots of if-then clauses and jumps, for example.
            In a full-fledged IDE, code can be written in a more clean fashion that can be managed more easily.

            A different example, for those more familiar with homebrew on PC:
            Turbo Pascal or QB45 on MS-DOS were such a case of a clean IDE with structured programming (no line numbers needed).
            I mention DOS environment, because DOS was used in embedded applications, often.
            And in DIY projects with electronics on a breadboard.

            Old XTs or 80186 single board computers could be used as controllers.
            And even here, the 64 KB limit per segment was a hurdle often.
            – 64 KB which seem so vast compared to 2 KB, by the way.
            COM files with that limit were quick and simple, but relocatable EXE files were very welcome.
            (DatalightbROM-DOS, for example, was used in embedded sector.)

          4. It is a perfect fit to control your washing machine, toaster or whatnot with its internal 2kB RAM.

            I just think that by 2025, such “low-end” chips should be given a reasonable amount of RAM.
            Not in the Megabytes range, but at least 24 or 32 KB of SRAM in 2025. For sake of the architecture’s dignity.
            So the RISC-V core can run applications that match its capabilities.
            The source material needed wouldn’t change because of this little change, either.

            I’m afraid that the 2 KB limit is the result of “market segmentation” of some sort, rather than technological reasons.
            The idea that castr*ted chips must exist for sake of different price points. Silly, IMHO.

  1. I did something very similar few years ago using an ATtiny and an SPI flash for the samples. I used the data from the festival project (https://www.cstr.ed.ac.uk/projects/festival/) and log encoded (https://en.wikipedia.org/wiki/%CE%9C-law_algorithm) the samples. The sounds in the festival “voice” I used were encoded as digram transitions (eg. Hello would look something like [he][el][lo][ou]) so it was just a matter of concatenating the proper sounds together.

    My language (czech) required high sample rate (= more storage) though, because of the high frequency component of our softened sounds (žščř..) that most people do not know how to pronounce :D.

  2. Ah, memories of that General Instrument SP0256 chip from Radio Shack, wired up to a VIC-20. With all its 1 MHz and 3.5 kB of RAM, though that chip with its 2 kB of ROM did all the real work. That was a lot of fun.

    Hey, HaD, how about an article on Dennis Klatt, “Father of speech synthesis”?

    1. I used SAM (software automatic mouth) on my Atari 8 bit machines. Sounded robotic, but it was understandable and, required no extra hardware. I recall making a “hitchhikers guide” door that welcomed you on entry and thanked you for using it on your exit.

Leave a Reply to CJay UKCancel reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.