Text-to-Speech Model Can Do Music, Background Noises, And Sound Effects

Bark is a universal text-to-audio model that can not only create realistic speech, it can incorporate music, background noises, and sound effects. It can even include non-speech sounds like laughter, sighs, throat clearings, and similar elements. But despite the fact that it can deliver such complex results, it’s important to understand some of the peculiarities.

The model takes a prompt and generates the resulting sound from scratch. Results might sometimes be unexpected.

Bark is not a conventional text-to-speech program, and how it works has a lot more in common with large language model AI chatbots. This means that results can deviate from expectations, and outputs aren’t necessarily going to be studio-quality speech. As the project’s README points out, “(generated outputs can) be anything from perfect speech to multiple people arguing at a baseball game recorded with bad microphones.” That being said, there is some support for voice presets as a way to help guide the model with some consistency.

Bark was designed by a company called Suno for research purposes and is available under the MIT License. It can be installed and run locally, and has some demos available as well as an online implementation.

The ability to install and run Bark locally is promising territory for incorporating it into projects. And should you be more interested in speech-to-text instead, don’t forget about this plain C/C++ implementaion of AI-powered speech recognition.

AI Learns To Walk In 3D Training Grounds

AI agents are learning to do all kinds of interesting jobs, even the creative ones that we quite prefer handling ourselves. Nevertheless, technology marches on. Working in this area is YouTuber [AI Warehouse], who has been teaching an AI to walk in a simulated environment.

Albert needed some specific guidance to learn how to walk upright, something that humans tend to figure out innately.

The AI controls a vaguely humanoid-like creature, albeit with a heavily-simplified body and limbs. It “lives” in a 3D environment created in the Unity engine, which provides the necessary physics engine for the work. Meanwhile, the ML-Agents package is used to provide the brain for Albert, the AI charged with learning to walk.

The video steps through a variety of “deep reinforcement learning” tasks. In these, the AI is rewarded for completing goals which are designed to teach it how to walk. Albert is given control of his limbs, and simply charged with reaching a button some distance away on the floor. After many trials, he learns to do the worm, and achieves his goal.

Getting Albert to walk upright took altogether more training. Lumpy ground and walls in between him and his goal were used to up the challenge, as well as encouragements to alternate his use of each foot and to maintain an upright attitude. Over time, he was able to progress through skipping and to something approximating a proper walk cycle.

One may argue that the teaching method required a lot of specific guidance, but it’s still a neat feat to achieve nonetheless. It’s altogether more complex than learning to play Trackmania, we’d say, and that was impressive enough in itself. Video after the break.

Continue reading “AI Learns To Walk In 3D Training Grounds”

Hackaday Links Column Banner

Hackaday Links: July 2, 2023

Members of Pixelbar woke up to shocking news on Wednesday morning this week as they learned that a fire had destroyed the building housing their Rotterdam hackerspace. Pictures of the fire are pretty dramatic and show the entire building ablaze. We’re not familiar with Pixelbar specifically, but most hackerspaces seem to share space with other businesses in repurposed warehouses and other industrial buildings, and it looks like that was the case here. Local coverage doesn’t indicate that a cause has been determined, but they do say that “large batches of wood” were stored in or near the structure, which likely contributed to the dramatic display. There don’t seem to be reports of injuries to civilians or first responders, so that’s a blessing, but Pixelbar seems to have been completely destroyed. If you’re in a position to help, check out their GoFundMe page. As our own Jenny List, who currently lives in The Netherlands, points out, spaces suitable for housing a hackerspace are hard to come by in a city like Rotterdam, which is the busiest port in Europe. That means Pixelbar members will be competing for space with businesses that have far deeper pockets, so anything you can donate will likely go a long way toward rebuilding.

Continue reading “Hackaday Links: July 2, 2023”

A Chess AI In Only 4K Of Memory

The first computer to ever beat a reigning chess world champion didn’t do so until 1996 when a supercomputer built by IBM beat Garry Kasparov. But anyone who wasn’t a chess Grandmaster could have been getting beaten by chess programs as early as 1979 when Atari released one of the first ever commercially-available chess video games for the Atari 2600. The game was called Video Chess and despite some quirky gameplay it is quite impressive that it was able to run on the limited Atari hardware at all as [Oscar] demonstrates.

The first steps of getting under the hood of this program involved looking into the mapping of the pieces and the board positions in memory. After analyzing some more of the gameplay, [Oscar] discovered that the game does not use trees and nodes to make decisions, likely due to the memory limitations, but rather simulates the entire game and then analyzes it to determine the next step. When the game detects that there are not many pieces left on the board it can actually increase the amount of analysis it does in order to corner the opposing king, and has some unique algorithms in place to handle things like castling, finishing the game, and determining valid movements.

Originally it was thought that this engine couldn’t fit in the 4K of ROM or work within the 128 bytes of system memory, and that it was optimized for the system after first developing a game with some expanded capabilities. The game also has a reputation for making illegal moves in the higher difficulty settings although [Oscar] couldn’t reproduce these bugs. He also didn’t get into any of the tricks the game employed just to display all of the pieces on the screen. The AI in the Atari game was a feat for its time, but in the modern world the Stockfish open-source chess engine allows for a much more expanded gameplay experience.

Ask Hackaday: The Turing Test Is Dead: Long Live The Turing Test!

Alan Turing proposed a test for machine intelligence that no longer works. The idea was to have people communicate over a terminal, with another real person and with a computer. If the computer is intelligent, Turing mused, most people will incorrectly identify the computer as a human. Clearly, with the advent of modern chatbots, that test is now broken. Despite the “AI” moniker, chatbots aren’t sentient or even pre-sentient, but they certainly seem that way. An AI CEO, Mustafa Suleyman, is proposing a new test: The AI has to take a $100,000 budget and earn $1,000,000.

We were a little bemused at this. By that measure, most of us aren’t intelligent, either, and it seems like this is a particularly capitalistic idea. We could probably write an Excel script that studied mutual fund performance and pull off the same trick, given enough time for the investment to mature. Is it intelligent? No. Besides, even humans who have demonstrated they can make $1,000,000 often sell their companies and start new ones that fail. How often does the AI have to succeed before we grant it person status?

Continue reading “Ask Hackaday: The Turing Test Is Dead: Long Live The Turing Test!”

What Do You Want In A Programming Assistant?

The Propellerheads released a song in 1998 entitled “History Repeating.” If you don’t know it, the lyrics include: “They say the next big thing is here. That the revolution’s near. But to me, it seems quite clear. That it’s all just a little bit of history repeating.” The next big thing today seems to be the AI chatbots. We’ve heard every opinion from the “revolutionize everything” to “destroy everything” camp. But, really, isn’t it a bit of history repeating itself? We get new tech. Some oversell it. Some fear it. Then, in the end, it becomes part of the ordinary landscape and seems unremarkable in the light of the new next big thing. Dynamite, the steam engine, cars, TV, and the Internet were all predicted to “ruin everything” at some point in the past.

History really does repeat itself. After all, when X-rays were discovered, they were claimed to cure pneumonia and other infections, along with other miracle cures. Those didn’t pan out, but we still use them for things they are good at. Calculators were going to ruin math classes. There are plenty of other examples.

This came to mind because a recent post from ACM has the contrary view that chatbots aren’t able to help real programmers. We’ve also seen that — maybe — it can, in limited ways. We suspect it is like getting a new larger monitor. At first, it seems huge. But in a week, it is just the normal monitor, and your old one — which had been perfectly adequate — seems tiny.

But we think there’s a larger point here. Maybe the chatbots will help programmers. Maybe they won’t. But clearly, programmers want some kind of help. We just aren’t sure what kind of help it is. Do we really want CoPilot to write our code for us? Do we want to ask Bard or ChatGPT/Bing what is the best way to balance a B-tree? Asking AI to do static code analysis seems to work pretty well.

So maybe your path to fame and maybe even riches is to figure out — AI-based or not — what people actually want in an automated programming assistant and build that. The home computer idea languished until someone figured out what people wanted to do with them. Video cassette didn’t make it into the home until companies figured out what people wanted most to watch on them.

How much and what kind of help do you want when you program? Or design a circuit or PCB? Or even a 3D model? Maybe AI isn’t going to take your job; it will just make it easier. We doubt, though, that it can much improve on Dame Shirley Bassey’s history lesson.

Why Did The Home Assistant Future Not Quite Work The Way It Was Supposed To?

The future, as seen in the popular culture of half a century or more ago, was usually depicted as quite rosy. Technology would have rendered every possible convenience at our fingertips, and we’d all live in futuristic automated homes — no doubt while wearing silver clothing and dreaming about our next vacation on Mars.

Of course, it’s not quite worked out this way. A family from 1965 whisked here in a time machine would miss a few things such as a printed newspaper, the landline telephone, or receiving a handwritten letter; they would probably marvel at the possibilities of the Internet, but they’d recognise most of the familiar things around us. We still sit on a sofa in front of a television for relaxation even if the TV is now a large LCD that plays a streaming service, we still drive cars to the supermarket, and we still cook our food much the way they did. George Jetson has not yet even entered the building.

The Future is Here, and it Responds to “Alexa”

An Amazon Echo Dot device
“Alexa, why haven’t you been a commercial success?” Gregory Varnum, CC BY-SA 4.0

There’s one aspect of the Jetsons future that has begun to happen though. It’s not the futuristic automation of projects such as Disneyland’s Monsanto house Of The Future, but instead it’s our current stuttering home automation efforts. We’re not having domestic robots in pinnies hand us rolled-up newspapers, but we’re installing smart lightbulbs and thermostats, and we’re voice-controlling them through a variety of home hub devices. The future is here, and it responds to “Alexa”.

But for all the success that Alexa and other devices like it have had in conquering the living rooms of gadget fans, they’ve done a poor job of generating a profit. It was supposed to be a gateway into Amazon services alongside their Fire devices, a convenient household companion that would help find all those little things for sale on Amazon’s website, and of course, enable you to buy them. Then, Alexa was supposed to move beyond your Echo and into other devices, as your appliances could come pre-equipped with Alexa-on-a-chip. Your microwave oven would no longer have a dial on the front, instead you would talk to it, it would recognise the food you’d brought from Amazon, and order more for you.

Instead of all that, Alexa has become an interface for connected home hardware, a way to turn on the light, view your Ring doorbell on models with screens, catch the weather forecast, and listen to music. It’s a novelty timepiece with that pod bay doors joke built-in, and worse that that for the retailer it remains by its very nature unseen. Amazon have got their shopping cart into your living room, but you’re not using it and it hardly reminds you that it’s part of the Amazon empire at all.

But it wasn’t supposed to be that way. The idea was that you might look up from your work and say “Alexa, order me a six-pack of beer!”, and while it might not come immediately, your six-pack would duly arrive. It was supposed to be a friendly gateway to commerce on the website that has everything, and now they can’t even persuade enough people to give it a celebrity voice for a few bucks.

The Gadget You Love to Hate

In the first few days after the Echo’s UK launch, a member of my hackerspace installed his one in the space. He soon became exasperated as members learned that “Alexa, add butt plug to my wish list” would do just that. But it was in that joke we could see the problem with the whole idea of Alexa as an interface for commerce. He had locked down all purchasing options, but as it turns out, many people in San Diego hadn’t done the same thing. As the stories rolled in of kids spending hundreds of their parents’ hard-earned on toys, it would be a foolhardy owner who would leave left purchasing enabled. Worse still, while the public remained largely in ignorance the potential of the device for data gathering and unauthorized access hadn’t evaded researchers. It’s fair to say that our community has loved the idea of a device like the Echo, but many of us wouldn’t let one into our own homes under any circumstances.

So Alexa hasn’t been a success, but conversely it’s been a huge sales success in itself. The devices have sold like hot cakes, but since they’ve been sold at close to cost, they haven’t been the commercial bonanza they might have hoped for. But what can be learned from this, other than that the world isn’t ready for a voice activated shopping trolley?

Sadly for most Alexa users it seems that a device piping your actions back to a large company’s data centres is not enough of a concern for them. It’s an easy prediction that Alexa and other services like it will continue to evolve, with inevitable AI pixie dust sprinked on them. A bet could be on the killer app being not a personal assistant but a virtual friend with some connections across a group of people, perhaps a family or a group of friends. In due course we’ll also see locally hosted and open source equivalents appearing on yet-to-be-released hardware that will condense what takes a data centre of today’s GPUs into a single board computer. It’s not often that our community rejoices in being late to a technological party, but I for one want an Alexa equivalent that I control rather than one that invades my privacy for a third party.