Duplex technology for Google Assistant

Google’s Duplex AI Has Conversation Indistinguishable From Human’s

First Google gradually improved its WaveNet text-to-speech neural network to the point where it sounds almost perfectly human. Then they introduced Smart Reply which suggests possible replies to your emails. So it’s no surprise that they’ve announced an enhancement for Google Assistant called Duplex which can have phone conversations for you.

What is surprising is how well it works, as you can hear below. The first is Duplex calling to book an appointment at a hair salon, and the second is it making reservation’s with a restaurant.

Note that this reverses the roles when talking to a computer on the phone. The computer is the customer who calls the business, and the human is on the business side. The goal of the computer is to book a hair appointment or reserve a table at a restaurant. The computer has to know how to carry out a conversation with the human without the human knowing that they’re talking to a computer. It’s for communicating with all those businesses which don’t have online booking systems but instead use human operators on the phone.

Not knowing that they’re talking to a computer, the human will therefore speak as it would with another human, with all the pauses, “hmm”s and “ah”s, speed, leaving words out, and even changing the context in mid-sentence. There’s also the problem of multiple meanings for a phrase. The “four” in “Ok for four” can mean 4 pm or four people.

The component which decides what to say is a recurrent neural network (RNN) trained on many anonymized phone calls. The input is: the audio, the output from Google’s automatic speech recognition (ASR) software, and context such as the conversation’s history and the parameters of the conversation (e.g. book places at a restaurant, for how many, when), and more.

Producing the speech is done using Google’s text-to-speech technologies, Wavenet and Tacotron. “Hmm”s and “ah”s are inserted for a more natural sound. Timing is also taken into account. “Hello?” gets an immediate response. But they introduce latency when responding to more complex questions since replying too soon would sound unnatural.

There are limitations though. If it decides it can’t complete a task then it hands the conversation over to a human operator. Also, Duplex can’t handle a general conversation. Instead, multiple instances are trained on different domains. So this isn’t the singularity which we’ve talked about before. But if you’re tired of talking to computers at businesses, maybe this will provide a little payback by having the computer talk to the business instead.

On a more serious note, would you want to know if the person you were speaking to was in fact a computer? Perhaps Google should preface each conversation with “Hi! This is Google Assistant calling.” And even knowing that, would you want to have a human conversation with a computer, knowing that it’s “um”s were artificial? This may save time for the person whom the call is on behalf of, but the person being called may wish the computer would be a little more computer-like and speak more efficiently. Let us know your thoughts in the comments below. Or just check out the following Google I/O ’18 keynote presentation video where all this was announced.

Continue reading “Google’s Duplex AI Has Conversation Indistinguishable From Human’s”

Neural Network Names Nightshades

Neural networks are a core area of the artificial intelligence field. They can be trained on abstract data sets and be put to all manner of useful duties, like driving cars while ignoring road hazards or identifying cats in images. Recently, a biologist approached AI researcher [Janelle Shane] with a problem – could she help him name some tomatoes?

It’s a problem with a simple cause – like most people, [Darren] enjoys experimenting with tomato genetics, and thus requires a steady supply of names to designate the various varities produced in this work. It can be taxing on the feeble human brain, so a silicon-based solution is ideal.

[Janelle] decided to use the char-rnn library built by [Andrej Karpathy] to do the heavy lifting. After training it on a list of over 11,000 existing tomato varieties, the neural network was then asked to strike out on its own.

The results are truly fantastic – whether you’re partial to a Speckled Garfech or you prefer the smooth flavor of the Golden Pow, there’s a tomato to suit your tastes. When the network was retrained with additional content in the form of names of metal bands, the results get even better – it’s only a matter of time before Angels of Saucing reach a supermarket shelf near you.

On the surface, it’s a fun project with whimsical output, but fundamentally it highlights how much can be accomplished these days by standing on the shoulders of giants, so to speak. Now, if you need some assistance growing your tomatoes, the machines can help there, too.

 

Neural Networks… On A Stick!

They probably weren’t inspired by [Jeff Dunham’s] jalapeno on a stick, but Intel have created the Movidius neural compute stick which is in effect a neural network in a USB stick form factor. They don’t rely on the cloud, they require no fan, and you can get one for well under $100. We were interested in [Jeff Johnson’s] use of these sticks with a Pynq-Z1. He also notes that it is a great way to put neural net power on a Raspberry Pi or BeagleBone. He shows us YOLO — an image recognizer — and applies it to an HDMI signal with the processing done on the Movidius. You can see the result in the first video, below.

At first, we thought you might be better off using the Z1’s built-in FPGA to do neural networks. [Jeff] points out that while it is possible, the Z1 has a lower-end device on it, so there isn’t that much FPGA real estate to play with. The stick, then, is a great idea. You can learn more about the device in the second video, below.

Continue reading “Neural Networks… On A Stick!”

TensorFlow In Your Browser

If you want to explore machine learning, you can now write applications that train and deploy TensorFlow in your browser using JavaScript. We know what you are thinking. That has to be slow. Surprisingly, it isn’t, since the libraries use Graphics Processing Unit (GPU) acceleration. Of course, that assumes your browser can use your GPU. There are several demos available, include one where you train a Pac Man game to respond to gestures in your webcam to control the game. If you try it and then disable accelerated graphics in your browser options, you’ll see just what a speed up you can gain from the GPU.

Continue reading “TensorFlow In Your Browser”

Tiny Neural Network Library In 200 Lines Of Code

Neural networks have gone mainstream with a lot of heavy-duty — and heavy-weight — tools and libraries. What if you want to fit a network into a little computer? There’s tinn — the tiny neural network. If you can compile 200 lines of standard C code with a C or C++ compiler, you are in business. There are no dependencies on other code.

On the other hand, there’s not much documentation, either. However, between the header file and two examples, you should be able to figure it out. After all, it isn’t much code. The example in the repository directs you to download a handwriting number recognition dataset from the Internet. Once it trains that data, it shows you the expected output from the first item in the data set and then processes the first item and shows you the result.

Continue reading “Tiny Neural Network Library In 200 Lines Of Code”

Google Builds A Synthesizer With Neural Nets And Raspberry Pis.

AI is the new hotness! It’s 1965 or 1985 all over again! We’re in the AI Rennisance Mk. 2, and Google, in an attempt to showcase how AI can allow creators to be more… creative has released a synthesizer built around neural networks.

The NSynth Super is an experimental physical interface from Magenta, a research group within the Big G that explores how machine learning tools can create art and music in new ways. The NSynth Super does this by mashing together a Kaoss Pad, samples that sound like General MIDI patches, and a neural network.

Here’s how the NSynth works: The NSynth hardware accepts MIDI signals from a keyboard, DAW, or whatever. These MIDI commands are fed into an openFrameworks app that uses pre-compiled (with Machine Learning™!) samples from various instruments. This openFrameworks app combines and mixes these samples in relation to whatever the user inputs via the NSynth controller. If you’ve ever wanted to hear what the combination of a snare drum and a bassoon sounds like, this does it. Basically, you’re looking at a Kaoss pad controlling rompler that takes four samples and combines them, with the power of Neural Networks. The project comes with a set of pre-compiled and neural networked samples, but you can use this interface to mix your own samples, provided you have a beefy computer with an expensive GPU.

Not to undermine the work that went into this project, but thousands of synth heads will be disappointed by this project. The creation of new audio samples requires training with a GPU; the hardest and most computationally expensive part of neural networks is the training, not the performance. Without a nice graphics card, you’re limited to whatever samples Google has provided here.

Since this is Open Source, all the files are available, and it’s a project that uses a Raspberry Pi with a laser-cut enclosure, there is a huge demand for this machine learning Kaoss pad. The good news is that there’s a group buy on Hackaday.io, and there’s already a seller on Tindie should you want a bare PCB. You can, of course, roll your own, and the Digikey cart for all the SMD parts comes to about $40 USD. This doesn’t include the OLED ($2 from China), the Raspberry Pi, or the laser cut enclosure, but it’s a start. Of course, for those of you who haven’t passed the 0805 SMD solder test, it looks like a few people will be selling assembled versions (less Pi) for $50-$60.

Is it cool? Yes, but a basement-bound producer that wants to add this to a track will quickly learn that training machine learning algorithms cost far more than playing with machine algorithms. The hardware is neat, but brace yourself for disappointment. Just like AI suffered in the late 60s and the late 80s. We’re in the AI Renaissance Mk. 2, after all.

Continue reading “Google Builds A Synthesizer With Neural Nets And Raspberry Pis.”

AI Listens To Radio

We’ve seen plenty of examples of neural networks listening to speech, reading characters, or identifying images. KickView had a different idea. They wanted to learn to recognize radio signals. Not just any radio signals, but Orthogonal Frequency Division Multiplexing (OFDM) waveforms.

OFDM is a modulation method used by WiFi, cable systems, and many other systems. In particular, they look at an 802.11g signal with a bandwidth of 20 MHz. The question is given a receiver for 802.11g, how can you reliably detect that an 802.11ac signal — up to 160 MHz — is using your channel? To demonstrate the technique they decided to detect 20 MHz signals using a 5 MHz bandwidth.

Continue reading “AI Listens To Radio”