Text-to-Speech Model Can Do Music, Background Noises, And Sound Effects

Bark is a universal text-to-audio model that can not only create realistic speech, it can incorporate music, background noises, and sound effects. It can even include non-speech sounds like laughter, sighs, throat clearings, and similar elements. But despite the fact that it can deliver such complex results, it’s important to understand some of the peculiarities.

The model takes a prompt and generates the resulting sound from scratch. Results might sometimes be unexpected.

Bark is not a conventional text-to-speech program, and how it works has a lot more in common with large language model AI chatbots. This means that results can deviate from expectations, and outputs aren’t necessarily going to be studio-quality speech. As the project’s README points out, “(generated outputs can) be anything from perfect speech to multiple people arguing at a baseball game recorded with bad microphones.” That being said, there is some support for voice presets as a way to help guide the model with some consistency.

Bark was designed by a company called Suno for research purposes and is available under the MIT License. It can be installed and run locally, and has some demos available as well as an online implementation.

The ability to install and run Bark locally is promising territory for incorporating it into projects. And should you be more interested in speech-to-text instead, don’t forget about this plain C/C++ implementaion of AI-powered speech recognition.

When Is Open Source AI Not Open Source AI?

The world of AI is abuzz, or at least parts of it are, at the news of Meta’s release of Llama 2. This is an AI text model which is thought to surpass ChatGPT in capabilities, and which the social media turned VR turned own all your things company wants you to know is open to all. That’s right, the code is open source and you can download the model, and Meta want you to feel warm and fuzzy about it. Unfortunately all is not as it seems, because of course the model isn’t open-source and is subject to a licensing restriction which makes it definitely not free of charge for larger users. This is of course disappointing to anyone hoping for an AI chatbot without restrictions, but we’re guessing Meta would prefer not to inadvertently enable a competitor.

Happily for the open source user large or small who isn’t afraid of a little work there’s an alternative in the form of OpenLLaMA, but we understand that won’t be for all users. Whichever LLM you use though, please don’t make the mistake of imagining that it possesses actual intelligence.

Thanks to the CoupledAI team for the tip!

Reviving Interlisp With The Medley Interlisp Project

Within the Artificial Intelligence and natural language research communities, Lisp has played a major role since 1960. Over the years since its introduction, various development environments have been created that sought to make using Lisp as easy and powerful as possible. One of these environments is Interlisp, which saw its first release in 1968, and its last official release in 1992. That release was Medley 2.0, which targeted various UNIX machines, DOS 4.0, and the Xerox 1186. Courtesy of the Interlisp open source project (GitHub), Medley Interlisp is available for all to use, even on modern systems.

The documentation contains information on how to install Medley on Linux but also Mac and Windows (via WSL1 or WSL2). For those just curious to give things a swing, there’s also an online version you can log into remotely. What Medley Interlisp gets you is an (X-based) GUI environment in which you can program in Lisp, essentially an IDE with a debugger and the (in)famous auto-correction tool for simple errors such as typos, known as the DWIM (Do What I Mean), which much like the derived version in Emacs seeks to automatically fix simple issues like misspellings without forcing the developer to fix it and restart the compilation.

Thanks to [Pixel_Outlaw] for the tip and for also telling us about a project they wrote using Medley Interlisp for the Spring Lisp Game Jame 2023 titled Interlisp Fifteen Puzzle.

3D Audio Imaging With A Phased Array Microphone

Remember the scene from Blade Runner, where Deckard puts a photograph into a Photo Inspector? The virtual camera can pan and move around the captured scene, pulling out impossible details. It seems that [Ben Wang] discovered how to make that particular trick a reality, but with audio instead of video. The secret sauce isn’t a sophisticated microphone, but a whole bunch of really simple ones. In this case, it’s 192 of them, arranged on long PCBs working as the spokes of a wall-art wheel. Quite the conversation piece.

Continue reading “3D Audio Imaging With A Phased Array Microphone”

Ask Hackaday: The Turing Test Is Dead: Long Live The Turing Test!

Alan Turing proposed a test for machine intelligence that no longer works. The idea was to have people communicate over a terminal, with another real person and with a computer. If the computer is intelligent, Turing mused, most people will incorrectly identify the computer as a human. Clearly, with the advent of modern chatbots, that test is now broken. Despite the “AI” moniker, chatbots aren’t sentient or even pre-sentient, but they certainly seem that way. An AI CEO, Mustafa Suleyman, is proposing a new test: The AI has to take a $100,000 budget and earn $1,000,000.

We were a little bemused at this. By that measure, most of us aren’t intelligent, either, and it seems like this is a particularly capitalistic idea. We could probably write an Excel script that studied mutual fund performance and pull off the same trick, given enough time for the investment to mature. Is it intelligent? No. Besides, even humans who have demonstrated they can make $1,000,000 often sell their companies and start new ones that fail. How often does the AI have to succeed before we grant it person status?

Continue reading “Ask Hackaday: The Turing Test Is Dead: Long Live The Turing Test!”

Blind Camera: Visualizing A Scene From Its Sounds Alone

A visualization by the Blind Camera based on recorded sounds and the training data set for the neural network. (Credit: Diego Trujillo Pisanty)
A visualization by the Blind Camera based on recorded sounds and the training data set for the neural network. (Credit: Diego Trujillo Pisanty)

When we see a photograph or photo of a scene, we can likely imagine what sounds would go with it, but what if this gets inverted, and we have to imagine the scene that goes with the sounds? How close would we get to reconstructing the scene in our mind, without the biases of our upbringing and background rendering this into a near-impossible task? This is essentially the focus of a project by [Diego Trujillo Pisanty] which he calls Blind Camera.

Based on video data recorded in Mexico City, a neural network created using Tensorflow 3 was trained using an RTX 3080 GPU on a dataset containing frames from these videos that were associated with a sound. As a result, when the thus trained neural network is presented with a sound profile (the ‘photo’), it’ll attempt to reconstruct the scene based on this input and its model, all of which has been adapted to run on a single Raspberry Pi 3B board.

However, since all the model knows are the sights and sounds of Mexico City, the resulting image will always be presented as a composite of scenes from this city. As [Diego] himself puts it: for the device, everything is a city. In a way it is an excellent way to demonstrate how not only neural networks are limited by their training data, but so too are us humans.

Continue reading “Blind Camera: Visualizing A Scene From Its Sounds Alone”

Contrary View: Chatbots Don’t Help Programmers

[Bertrand Meyer] is a decided contrarian in his views on AI and programming. In a recent Communications of the ACM blog post, he reveals that — unlike many others — he thinks AI in its current state isn’t very useful for practical programming. He was responding, in part, to another article from the ACM entitled “The End of Programming,” which, like many other articles, is claiming that, soon, no one will write software the way we do and have done for the last few decades. You can see [Matt Welsh] describe his thoughts on this in the video below. But [Bertrand] disagrees.

As we have also noted, [Bretrand] says:

“AI in its modern form, however, does not generate correct programs: it generates programs inferred from many earlier programs it has seen. These programs look correct but have no guarantee of correctness.”

Continue reading “Contrary View: Chatbots Don’t Help Programmers”