Speech synthesis has been around since roughly the middle of the 20th century. Once upon a time, it took remarkably advanced hardware just to even choke out a few words. But as [atomic14] shows with this project, these days it only takes some open source software and 10-cent microcontroller
The speech synth is implemented on a CH32V003 microcontroller, known for its remarkably low unit cost when ordered in quantity. It’s a speedy little RISC-V chip running at 48 MHz, albeit with the limitation of just 16 KB of Flash and 2 KB of SRAM on board.
The microcontroller is hooked up to a speaker via a simple single-transistor circuit, which allows for audio output. [atomic14] first demonstrates this by having the chip play back six seconds of low quality audio with some nifty space-saving techniques to squeeze it into the limited flash available. Then, [atomic14] shows how he implemented the Talkie library on the chip, which is a softwarehttps://www.youtube.com/watch?v=RZvX95aXSdM implementation of Texas Instruments’ LPC speech synthesis architecture—which you probably know from the famous Speak & Spell toys. It’s got a ton of built in vocabulary out of the box, and you can even encode your own words with some freely available tools.
We’ve seen [atomic14] tinker with these chips before, too.
 
            
 
 
    									 
    									 
    									 
    									 
			 
			 
			 
			 
			 
			 
			 
			 
			 
			
48 MHz but less memory than a C64..
What weird kind of reality is this we’re living in? 😭
C64 did not cost 10 cents :-)
And BTW S.A.M. reciter on C64 was pretty advanced and did not need that much memory, there was still lot of free space for BASIC programs with new SAY command.
I remember writing number guessing game for it in BASIC as a kid and my parents did understand what it is saying and could play it :-) And BTW it was not in english, the built in phoneme syntax was good for most languages.
Yes, sure. But come on, 48 MHz! 😃
And merely ~2048 ASCII characters of random access memory!
A blank text file alone occupies 512 Bytes on a FAT medium.
That’s like combining an AMD Ryzen with 512 KB of Conventional Memory as RAM..
All the processing power in the ALU and SIMDs but then
there’s no RAM for sophisticated applications to put all the power to good use! 😢
It does not. That’s just the reserved allocation unit due to the file system. The file system is just a map of where the data is, not the data itself. The file system could pretend it has allocated 512 bytes for the file even if it could actually only store e.g. the first 128 bytes.
Moore’s law reality. They made transistors inside the chip smaller to make it cheaper. But that also made it faster. And (spoiler) with more leakage, the deep sleep consumption of that chip isn’t that good.
Yes, but 64 KB of RAM were affordable 40 years ago!
Why can’t this thing have 32 KB of internal SRAM, at least?
After 40 years of progress and in these times of 64 GB RAM DDRx modules and SD cards in the TB range.
The way it is now, it barely beats an Atari 2600. 🥲
Because most embedded applications running on a chip with only a few I/O lines simply do not need that much RAM. Adding unnecessary RAM would increase transistor count, chip size and power consumption.
Hi, what I mean is the relation, the balance.
The computational power is not nearly in balance with the RAM expansion.
Here you have an advanced RISC-V architecture that barely has the memory capacity to play tic-tac-toe, yet alone chess.
This is a waste of an otherwise precious creation of human intellect.
I feel so sad for this little piece of silicon. It’s been restricted for no good reason.
It’s like having a 10 HHz Pentium IV processor that has a couple of shift registers for memory and a paper punch card reader for mass storage. So sad.
The computational power is needed for fast signal processing, which doesn’t necessarily need much RAM. Something like a FIR filter, which only needs a few hundred bytes for a buffer, but it has to run fast in order to process things like audio in real time.
For example, if you want to push samples through at a rate of 96 kHz and your CPU is 48 MHz, you can only afford to spend 500 cycles on computing each sample.
That sets the limit of how much RAM you can afford to access per each sample, because accessing memory takes clock cycles. You would be addressing less than 500 bytes of RAM because you have to do something with the data as well.
For static data, that can go into flash memory.
Relatively speaking. Those RAM chips cost around $50 in today’s money.
I mean, I see it this way:
By adding a little bit more of RAM and making it 5 or 10 cent more expensive it becomes useful.
Without it, silicon material is wasted for no good reason. And 10 cents are simply lost.
I don’t get the profits thinking. If you do produce something, it should be functional.
Just producing it to beat the competition is unlogical from a functional point of view.
It’s better to save up money and buy something more expensive that functions properly (at very least) and doesn’t cause headaches.
Because headaches and wasted human lifetime are more “expensive” than a bit of comparitively worthless money. IMHO. 🤷♂️
Just buy a STM32C011 or CH32V006 or PY32F003 instead?
This is pointless criticism, because there are plenty of applications for devices as small as this.
There are even still applications for $0.03 8-bit MCUs. Every took one of those battery operated lights apart?
Hi. It’s an opinion, not a criticism. And money isn’t my problem here.
It’s not about me and not about me wanting something.
It’s about restricting an precious piece of technology from a philosophical point of view.
It could do so much more if it was reasonably being designed.
My heart bleeds a little bit if I see such poor “creations”.
That being said, the project itself is well done. I never meant to criticise that.
Like what? MCUs like this with limited IO generally do not even need to handle that much data. Their main point is just to get whatever simple thing they do done quickly, so the device can go back to sleep to conserve batteries.
For example, according to the data sheet the device consumes between 1-9 mA running depending on how many peripherals you have in use and what clock options and voltages you use. It goes down to 0.4 mA in sleep mode. With suitable power gating, that can go down to nanoamps.
Now, if you wish to create a battery powered device, your realistic goal is in the low microamps, so you should use power gating timers to get there. Suppose your target is 10 microamps and we assume 7 mA at 48 MHz running: that means the device can only operate for 1.4 milliseconds per second. Your budget for computation is therefore about 68,500 clock cycles per second.
If you wish the device to wake up and check some IO and do some action every 20 milliseconds for something like checking if a button is being pressed, your budget for computation per operation drops to just 1371 clock cycles. That includes booting up the MCU, checking the IO, doing some simple calculations, setting the outputs and then dropping power off. You really can’t do much in that time that would demand any amount of RAM because you don’t have enough time to process it. In the case outlined, even 2k would be totally overkill because you simply don’t have enough clock cycles to fill it with data and process it.
Speed is the key. This is why these simple MCUs are “ridiculously” fast compared to the amount of memory they have. They’re intended for applications where the MCU is mostly doing nothing, so it can remain turned off.
Useful for what? What exactly demands more RAM?
okay… so you have a problem with the fact that a 48MHz microcontroller has only 2k of RAM… you completely miss the point that it only costs 10 cents?!?! You completely miss the point that most applications do not require that much of RAM anyways!??! Yet you seem to be completely OK with the fact that it only has 8 pins of which two are reserved for power, so six IO-pins… why do you need more then 2K of RAM on a device with only 6 IO-pins?!? To be honest, I do not think the 2K is a restriction and for what it is and does it’s more than enough and in most cases it will be much more than needed. And for those rare occasions you do need more, shell out another 10 cents and get the bigger version.
Regarding the project here, cool little project very inspirational, a nice explanatory video too.
I think the implication is that because it’s 48 MHz it should be able to run DOOM or something. It’s totally missing the point of what these things are made for.
It is useful as it is, for simple controlling tasks with little I/O.
Don’t confuse general purpose microprocessors with gigabytes addressing, against simple microcontrollers. Yes the humble controller today has more CPU power than your dad’s C64, but is not designed to address large RAMs. It is a perfect fit to control your washing machine, toaster or whatnot with its internal 2kB RAM. But you would not make a BASIC homecomputer with it.
I did something very similar few years ago using an ATtiny and an SPI flash for the samples. I used the data from the festival project (https://www.cstr.ed.ac.uk/projects/festival/) and log encoded (https://en.wikipedia.org/wiki/%CE%9C-law_algorithm) the samples. The sounds in the festival “voice” I used were encoded as digram transitions (eg. Hello would look something like [he][el][lo][ou]) so it was just a matter of concatenating the proper sounds together.
My language (czech) required high sample rate (= more storage) though, because of the high frequency component of our softened sounds (žščř..) that most people do not know how to pronounce :D.
There is an Arduino Library for it: https://github.com/going-digital/Talkie
Yes. That’s what was used and is what the video and article both mentioned.
Ah, memories of that General Instrument SP0256 chip from Radio Shack, wired up to a VIC-20. With all its 1 MHz and 3.5 kB of RAM, though that chip with its 2 kB of ROM did all the real work. That was a lot of fun.
Hey, HaD, how about an article on Dennis Klatt, “Father of speech synthesis”?
I used SAM (software automatic mouth) on my Atari 8 bit machines. Sounded robotic, but it was understandable and, required no extra hardware. I recall making a “hitchhikers guide” door that welcomed you on entry and thanked you for using it on your exit.
My first ever computer project! So many hours of fun, I even bought a SPO256 and matching 3.12Mhz crystal a couple of years ago to recreate it.
Thank you for making this!
I love the idea of getting gritty audio samples on hardware this cheap!