The Right Benchmark For GPT

Dan Maloney wanted to design a part for 3D printing. OpenSCAD is a coding language for generating 3D objects. ChatGPT can write code. What could possibly go wrong? You should go read his article because it’s enlightening and hilarious, but the punchline is that it ran afoul of syntax errors, but also gave him enough of a foothold that he could teach himself enough OpenSCAD to get the project done anyway. As with many people who have asked the AI to create some code, Dan finds that it’s not as good as asking someone who knows what they’re doing, but that it’s also better than nothing.

And this is where I start grumbling. When you type your desires into the word-follower machine, your alternative isn’t nothing. Your alternative is to fire up a search engine instead and type “openscad tutorial”. That, for nearly any human endeavor, will get you a few good guides, written by humans who are probably expert in the subject in question, and which are aimed at teaching you the thing that you want to learn. It doesn’t get better than that. You’ll be up and running with your design in no time.

Indeed, if you think about the relevant source material that the LLM was trained on, it’s exactly these tutorials. It can’t possibly do better than the best of them, although the resulting average tutorial might be better than the worst you’ll find. (Some have speculated on what happens when the entire Internet is filled with these generated texts – what will future AIs learn from?)

In Dan’s case, though, he didn’t necessarily want to learn OpenSCAD – he just wanted the latch designed. But in the end, he had to learn enough OpenSCAD to get the AI code compiling without error. He spent an hour learning OpenSCAD and now he’s good to go on his next project too.

So the next time you hear someone say that they got an answer back from a large language model that wasn’t perfect, but it was “better than nothing”, think critically if “nothing” is really the right benchmark.

Do you really want to learn nothing? Do you really have no resources to get started with? I would claim that we have the most amazing set of tutorial resources the world has ever known at our fingertips. Compared to the ability to teach millions of humans to achieve their own goals, that makes the LLM party tricks look kinda weak, in my opinion.

GETMusic Uses Machine Learning To Generate Music, Understands Tracks

Music generation guided by machine learning can make great projects, but there’s not usually much apparent control over the results. The system makes what it makes, and it’s an achievement if the results are not obvious cacophony. But that’s all different with GETMusic which allows for a much more involved approach because it understands and is able to create music by tracks. Among other things, this means one can generate a basic rhythm and melody first, then add additional elements to those existing ones, leaving the previous elements unchanged.

GETMusic can make music from scratch, or guided from examples, and under the hood uses a diffusion-based approach similar to the method behind AI image generators like Stable Diffusion. We’ve previously covered how Stable Diffusion works, but instead of images the same basic principles are used to guide the model from random noise to useful tracks of music.

Just a few years ago we saw a neural network trained to generate Bach, and while it was capable of moments of brilliance, it didn’t produce uniformly-listenable output. GETMusic is on an entirely different level. The model and code are available online and there is a research paper to accompany it.

You can watch a video putting it through its paces just below the page break, and there are more videos on the project summary page.

Continue reading “GETMusic Uses Machine Learning To Generate Music, Understands Tracks”

ChatGPT, The Worst Summer Intern Ever

Back when I used to work in the pharma industry, I had the opportunity to hire summer interns. This was a long time ago, long enough that the fresh-faced college students who applied for the gig are probably now creeping up to retirement age. The idea, as I understood it, was to get someone to help me with my project, which at the time was standing up a distributed data capture system with a large number of nodes all running custom software that I wrote, reporting back to a central server running more of my code. It was more work than I could manage on my own, so management thought they’d take mercy on me and get me some help.

The experience didn’t turn out quite like I expected. The interns were both great kids, very smart, and I learned a lot from them. But two months is a very tight timeframe, and getting them up to speed took up most of that time. Add in the fact that they were expected to do a presentation on their specific project at the end of the summer, and the whole thing ended up being a lot more work for me than if I had just done the whole project myself.

I thought about my brief experience with interns recently with a project I needed a little help on. It’s nothing that hiring anyone would make sense to do, but still, having someone to outsource specific jobs to would be a blessing, especially now that it’s summer and there’s so much else to do. But this is the future, and the expertise and the combined wisdom of the Internet are but a few keystrokes away, right? Well, maybe, but as you’ll see, even the power of large language models has its limit, and trying to loop ChatGPT in as a low-effort summer intern leaves a lot to be desired.

Continue reading “ChatGPT, The Worst Summer Intern Ever”

Text-to-Speech Model Can Do Music, Background Noises, And Sound Effects

Bark is a universal text-to-audio model that can not only create realistic speech, it can incorporate music, background noises, and sound effects. It can even include non-speech sounds like laughter, sighs, throat clearings, and similar elements. But despite the fact that it can deliver such complex results, it’s important to understand some of the peculiarities.

The model takes a prompt and generates the resulting sound from scratch. Results might sometimes be unexpected.

Bark is not a conventional text-to-speech program, and how it works has a lot more in common with large language model AI chatbots. This means that results can deviate from expectations, and outputs aren’t necessarily going to be studio-quality speech. As the project’s README points out, “(generated outputs can) be anything from perfect speech to multiple people arguing at a baseball game recorded with bad microphones.” That being said, there is some support for voice presets as a way to help guide the model with some consistency.

Bark was designed by a company called Suno for research purposes and is available under the MIT License. It can be installed and run locally, and has some demos available as well as an online implementation.

The ability to install and run Bark locally is promising territory for incorporating it into projects. And should you be more interested in speech-to-text instead, don’t forget about this plain C/C++ implementaion of AI-powered speech recognition.

AI Learns To Walk In 3D Training Grounds

AI agents are learning to do all kinds of interesting jobs, even the creative ones that we quite prefer handling ourselves. Nevertheless, technology marches on. Working in this area is YouTuber [AI Warehouse], who has been teaching an AI to walk in a simulated environment.

Albert needed some specific guidance to learn how to walk upright, something that humans tend to figure out innately.

The AI controls a vaguely humanoid-like creature, albeit with a heavily-simplified body and limbs. It “lives” in a 3D environment created in the Unity engine, which provides the necessary physics engine for the work. Meanwhile, the ML-Agents package is used to provide the brain for Albert, the AI charged with learning to walk.

The video steps through a variety of “deep reinforcement learning” tasks. In these, the AI is rewarded for completing goals which are designed to teach it how to walk. Albert is given control of his limbs, and simply charged with reaching a button some distance away on the floor. After many trials, he learns to do the worm, and achieves his goal.

Getting Albert to walk upright took altogether more training. Lumpy ground and walls in between him and his goal were used to up the challenge, as well as encouragements to alternate his use of each foot and to maintain an upright attitude. Over time, he was able to progress through skipping and to something approximating a proper walk cycle.

One may argue that the teaching method required a lot of specific guidance, but it’s still a neat feat to achieve nonetheless. It’s altogether more complex than learning to play Trackmania, we’d say, and that was impressive enough in itself. Video after the break.

Continue reading “AI Learns To Walk In 3D Training Grounds”

Hackaday Links Column Banner

Hackaday Links: July 2, 2023

Members of Pixelbar woke up to shocking news on Wednesday morning this week as they learned that a fire had destroyed the building housing their Rotterdam hackerspace. Pictures of the fire are pretty dramatic and show the entire building ablaze. We’re not familiar with Pixelbar specifically, but most hackerspaces seem to share space with other businesses in repurposed warehouses and other industrial buildings, and it looks like that was the case here. Local coverage doesn’t indicate that a cause has been determined, but they do say that “large batches of wood” were stored in or near the structure, which likely contributed to the dramatic display. There don’t seem to be reports of injuries to civilians or first responders, so that’s a blessing, but Pixelbar seems to have been completely destroyed. If you’re in a position to help, check out their GoFundMe page. As our own Jenny List, who currently lives in The Netherlands, points out, spaces suitable for housing a hackerspace are hard to come by in a city like Rotterdam, which is the busiest port in Europe. That means Pixelbar members will be competing for space with businesses that have far deeper pockets, so anything you can donate will likely go a long way toward rebuilding.

Continue reading “Hackaday Links: July 2, 2023”

A Chess AI In Only 4K Of Memory

The first computer to ever beat a reigning chess world champion didn’t do so until 1996 when a supercomputer built by IBM beat Garry Kasparov. But anyone who wasn’t a chess Grandmaster could have been getting beaten by chess programs as early as 1979 when Atari released one of the first ever commercially-available chess video games for the Atari 2600. The game was called Video Chess and despite some quirky gameplay it is quite impressive that it was able to run on the limited Atari hardware at all as [Oscar] demonstrates.

The first steps of getting under the hood of this program involved looking into the mapping of the pieces and the board positions in memory. After analyzing some more of the gameplay, [Oscar] discovered that the game does not use trees and nodes to make decisions, likely due to the memory limitations, but rather simulates the entire game and then analyzes it to determine the next step. When the game detects that there are not many pieces left on the board it can actually increase the amount of analysis it does in order to corner the opposing king, and has some unique algorithms in place to handle things like castling, finishing the game, and determining valid movements.

Originally it was thought that this engine couldn’t fit in the 4K of ROM or work within the 128 bytes of system memory, and that it was optimized for the system after first developing a game with some expanded capabilities. The game also has a reputation for making illegal moves in the higher difficulty settings although [Oscar] couldn’t reproduce these bugs. He also didn’t get into any of the tricks the game employed just to display all of the pieces on the screen. The AI in the Atari game was a feat for its time, but in the modern world the Stockfish open-source chess engine allows for a much more expanded gameplay experience.