If there’s one thing that will surely blind us, its reading resistor color bands. It doesn’t help that red looks exactly like orange, brown and black are indistinguishable, and different component manufacturers – for some reason – don’t use identical paints for coding their resistors. [Jeff] over at Gadget Gangster has been having the same problem, so he built a talking resistor calculator to speak resistor values to him.
The electronics part of the build is extremely simple with just an MCP3208 ADC providing 12 bits of resolution. The software side is where this project really shines. [Jeff] used a Gadget Gangster QuickStart board housing a Parallax Propeller. With 8 cores running in parallel the Propeller is more than enough to run [Phil Pilgrim]’s software speech synthesizer. When a resistor is connected to the two alligator leads, the Propeller goes through a lookup table and finds a resistor value matching the number coming from the ADC. From there, it’s just sending a string of phonetic text to the speech synthesizer object.
Even though text-to-speech chips have been around for decades now, [Jeff] chose to build his speech synthesis tool with software. It may just be a testament to the power in the Propeller microcontroller, but anything that keeps us from squinting at resistor color bands is alright by us.
To be frank I’m not sure if this is an improvement over hard to read colour bands… It might be because English isn’t my first language but I’m having a hard time hearing what the values actually are.
And here’s yet another project that reminds me of one of my old ones. I went with a display though (one of them Nokia knockoffs IIRC) that showed the value and the closest matching E12 and E24 values but, as usual, I abandoned it when it was 90% complete. WHY can’t I even finish the last 10% of my projects!?
It could be modified to output morse code instead of English. :D
“WHY can’t I even finish the last 10% of my projects!?”
Because you need to move up to E48 or E96.
I’m a native English speaker, and I have to say that the implementation in this example is pretty darn difficult to understand. I thought it was gibberish until I turned the volume a little higher and realized what they were saying. If you heard the synthesized speech here randomly on the street, the first thing you’d guess would not be that someone were speaking random resistor values.
Phoneme output is extremely easy. It is just a table lookup and dump the recorded phoneme to the audio output. I did that in software back in the 70’s and it worked quite well. The limiting factor is waveform storage, but even then you can make some (if not all) of the sound with some simple little active waveform synthesis subroutines.
The hard part is processing raw text and converting that to phonemes, which was done in advance outside the scope of this device.
This could be done on a little PIC processor, so “8 cores running in parallel” is INDEED “more than enough”… :D
Phonemes on primitive speech synthesizers were typical LPC encoded (Linear Predictive Coding), but using active phoneme synthesis uses even LESS storage space for the phonemes, although you need to get a little creative there, and you may still want to just lookup some of the “difficult” prerecorded phonemes. :D
Plosives. The distance between P and B is a minefield.
In fact, the processing power to render T distinctly from D in the middle of a word is apparently greater than that actually available to a growing segment of the population.
It’s very interesting that cell phone voice connections on crowded cell towers will degrade to below the point that words cannot be deciphered except through context. I swear the data rate drops to 3k/sec on a regular basis.
A P and a B are virtually identical except that one is voiced (the same with T and D, and other combinations). This is not a problem for synthesis, and although voice INPUT may be more difficult to differentiate voiced and unvoiced plosives, a tracking bandpass filter helps. But this is on OUTPUT device, and does not concern itself with text -to-speech OR with voice recognition.
I can only think of this as an exercise (and quite a nice one), but I always use a multimeter: it’s faster, easier to understand and cheaper :\
We did a very nice female voice for a security system that anyone could copy.
We evaluated everything available, including festival and the rest. They work, but that’s where it ends.
None of the commercial TTS stuff was all that good, so we decided to go all 1978 and simply digitize speech, old school.
We recorded voices (lots of voices) and stored tables of words at a very low data rate, then sticking it into tables with an index table in front to point to word stop and start points.
The secret sauce: Have the Ukrainian cleaning girl read the text {numbers, some words, and a couple of ghetto phonemes to modify words) for $500, have an intern mess with it in some audio program (audacity?) and then filter it down.
It was great fun to listen to her speak, and then modify what she was supposed to say so that it sounded “right”. Telling her to pronounce words as if they had an extra h, that kind of thing.
Her english was quite alien sounding, and this was what made it all work. Trying the same thing with native english speakers worked but was “off” – it was awful to listen to, as the prosidy and cadence kept the words from sounding very good in isolation. You need someone who speaks without inflection, and she was perfect.
Side note: We did this on a saturday night, and she showed up way overdressed and had been drinking. I suppose the fact that we had expensive video cameras on tripods with lighting gear (for an unrelated project) didn’t help, and the mattress against the loading bay probably looked a bit sketchy, and I’m guessing that the fact that we were playing funk music pretty much went along with her worst fears.
She visibly relaxed when she opened the script (marked SCRIPT) and saw only things like like “The blah door is not secured.” and “The white zone is for loading and unloading only.”
I later proposed adding an ED-209 type voice as an option, but we were already late and had to ship.
A big advantage of the project was the discovery that by adding a sliding window averager, we could essentially pull out new words out that were not in the database and that the low speech rate made this very doable.
A lookup table to let us pull disparate sequences out of the database and merge them worked pretty well. It didn’t sound any worse than the typical formant synthesis.
We were going to post-process the data files and render it down into a much smaller file by expanding the sequences and eliminating overlap.
This is in fact what most speech libraries end up becoming – it’s a lot of work and that’s why most of the voice libraries cost so much money or have really restricted licenses.
On the topic of TTS, Yamaha has some similar tech for PCs called vocaloid, but it’s so fraught with cultural disconnects, licensing issues, insane pricing and lack of love for the product line that nobody uses it except to provide voices for little anime dancing girls.
Ghetto tip #3: If you can’t find a good source of speech and word files, and don’t want to read a million words, books on tape (dvd and cd these days) can give you lots of speech to tag.
I have a Tim Curry file that was chopped up with a bit of PERL at pauses and provides no end of amusement. The guy provides excellent diction.
Others have done this – the project that inspired me was using the same kind of collection made from various Christopher Walken sources to annoy telephone solicitors. He just didn’t take it far enough.
Interestingly, it was about 1978 when I did that too. I recorded my own voice reading a list of words that contained all the phonemes. Then I displayed their waveform and played an audio repeat loop, while I adjusted the “start” and “end” sliders until I isolated each phoneme. I sliced out the phonemes and stored them in a table. I could play arbitrary words with phoneme lookup. It sounded very artificial, but the words were quite understandable IF you adjusted the phonemes (sometimes inserting extra phonemes to “blend” between harsh discontinuities). Then I replaced some of them like S and SH and a couple of vowels with algorithmic synthesis, but that never made it into the final project.
The point is, this can be done with VERY LITTLE processing power, and rather simplistic table-driven code, giving a crude but quite usable output. These days, you have enough speed to PWM a speaker directly with a GPIO bit (not an option back then, but we used zero-crossing single-bit output that sounded okay when the input audio was differentiated into a series of spikes. The output speaker acted as a low-pass integrator (inverse function of input differentiator). PWM would be better these days with the faster processors.
Ah, yes.
Back when prying the plated lid off a 256 bit DRAM chip and pasting a lens to it was the future of machine vision, provided you could focus it and set the right apeture.
It’s interesting how PWM kept coming back – from the 1950’s on, somebody was always buzzing something to make noise. Or worse, hooking up those terrible little alarm bells. Never again!
PWM works like gangbusters, but most of us wasted time doing D/A conversion because we just didn’t know any better, and the processors were all so… very… slow… even when you tried to cheat with shift registers. You needed the resolution in order to get enough bandwidth – a low data rate is ok if you can get nice slopes out.
I had a votrax speech synth circa 1975, the kind you plugged into your ugly blue topped cage of blinking lights that doubles as a pop tart warmer. In fact, I have one here still, but it’s the serial version.
One day I had a guy who worked for Votrax in the early 90’s apply for a sales job. He had no idea that FSW had even made such a thing, or that the company had started as a hobbyist speech synth company.
I also remember when Wolfenstein came out on the apple 2 and all of a sudden the whole world realized that assorted beeps and boops were pretty dull; you needed screams to have a great game – and FPS was never the same.
A few years before this, there was a coin-op videogame that won notoriety by emitting shreiks as you drove over little people. It was the first talking computer I’d ever seen, and most owners turned off the screams to keep parents happy.
I played with a bell-knockoff formant synthesizer for a few days and basically decided that by the time you had enough hardware to model the vocal tract, and the sequencing, you might as well go back to a d/a converter.
I just wish I’d known that I didn’t really need to buy those expensive Analog Devices chips to do it, and that I didn’t really even need 8 bits.
“we used zero-crossing single-bit output that sounded okay when the input audio was differentiated into a series of spikes.
Lots of words here, but he means two resistors and a capacitor.
Yes, a very simple differentiator (opposite of an integrator) can be make with as little as a capacitor and resistor (which is how I did it). The output is the rate of change (velocity) of the input waveform. Then you run it through a zero-crossing detector (a pair of clipping diodes) to convert that analog to digital. On an Apple-II you could just feed it into the analog tape-in without the clipping diodes. The point is that the clipped output sounded a lot better if you convert the voice input to a “rate-of-change” input before “digitizing” to 1-bit digital. I tried PWM, but I could not get the output frequency to a useful rate with a software-only method.
The DRAM “cameras” were faster if you used a Bias LED (suggested by Carl Helmers in one of his “non-Byte” books). That is why Robot eyes glowed red in the old Sci-Fi movies. :D Interestingly, the same (light discharges the memory bit faster) effect is used to make LEDs double as photodetectors in embedded systems these days. A reverse-biased LED acts like a capacitor, which discharges faster when exposed to light.
Anyway, voice output is useful on all manner of test intruments, and in some cases is a lot more useful than the “diode-checker” beep built into many multimeters. After I (and family and friends) spent WEEKS sorting THOUSANDS of tiny assorted resistors with bright lights and magnification one summer a few years back, I would have LOVED to have one of these resistor readers.
Rob,
It’s gone full circle – some plasma screen TVs are using web-cam style image sensors to handle ambient light detection. Should be interesting to see how that plays out on the web connected models.
BTW, the robot eyes glowed red because red LEDs were cheap; before that the eyes were incandescent bulbs with red lenses.
For the gentleman with the talking spectrometer: Excellent! You might wish to try checking some of the more recent chinese alloys – how on earth anyone gets large amounts of Tin and cadmium into purportedly T6061 aluminum is beyond me. :P
Very interesting: My typo count is growing logarithmically lately.
Interesting. Gives the impression that you dodged the equivalent of the “uncanny valley” for speech, by starting with something that sounded wrong to begin with.
While adding speech to the equivalent of a multimeter may seem of dubious utility other than as a programming exercise, in the right circumstances it can be quite useful; you just never know.
At my workplace, we used to use a particular portable x-ray spectrometer for field metallurgy, and to allow quick identification of most alloys.
One of my superiors likes playing practical jokes, or making wild claims and seeing how long he can string folks along. One of these was that the manufacturer was coming out with a speech upgrade. He took it so far as to ask me if I could get one to talk, even if it was just something silly. I said if it’s worth giving me a $1,000 budget, some time, and permission to take one apart; I’ll try to make it happen. He agreed.
Turns out it was easy enough. Inside, it was a PC/104 based MS-DOS system, with a hard drive. I added a PC/104 voice/sound card and the driver. While I didn’t have access to the original software source code, it was easy enough to write a TSR that scanned the text screen memory for certain criteria, and triggered the speech card.
Voila, talking spectrometer. After each test, it would speak aloud the closest alloy ID. As well as some random sound clips more in line with the original idea of a joke, like Howard Stern’s “Duck Job” recording. Everyone had a good time.
But it turns out it was actually useful. This thing was heavy, and the most practical way to move it around in the field while climbing on scaffolding and pipes was slung over the back. While taking a measurement, you often needed one hand to hold the probe, and one hand to support yourself; leaving no way to hold the main case where you could read the screen. And the screen was hard to see in sunlight anyway. Speech completely solved this problem.
So I was asked to add speech to our other spectrometers, minus Howard Stern. Pretty soon after that, someone at a job site who used the same instrument noticed, and called up the manufacturer to ask them for the speech upgrade they’d seen us using. The manufacturer in confusion called us to see what the heck was going on. Followed by them flying someone in to see a demo. Next thing we know, our aftermarket upgrade is endorsed and featured in their newsletter; and folks start sending in their units for the upgrade.
Like I said… You never know.
Meant for reply to no one in particular. HAD, *please* fix this already. :(
Anyone know why after all these decades, resistors are still labeled with coded bands instead of just printing the value in numbers?
Make a small flat area on the resister and just print the number already – it is after all the 21st century.
…because orienting a cylinder was hard – and you couldn’t guarantee which end would be up once the leads were formed. Not very useful to repair guys if they have to pull each resistor off the board to figure out what the value should be.
The fabrication of specific value resistors is almost as interesting as making tubes, as is the historical progression from rheostats to fixed resistors. Wasn’t that long ago that 1/2 watt resistors were all the rage, and before that the godawful clay potted monsters. The newest ones are smaller than the space between the color bands on the 1/16th watt models.
All modern resistors (apart from the old school carbon composite ones) are labeled, as up is almost always up. And you can buy 0 ohm resistors that can double as jumpers.
Nice chunk of knowledge in the comments :) Thanks guys!