Hackaday Links Column Banner

Hackaday Links: August 13, 2023

Remember that time when the entire physics community dropped what it was doing to replicate the extraordinary claim that a room-temperature semiconductor had been discovered? We sure do, and if it seems like it was just yesterday, it’s probably because it pretty much was. The news of LK-99, a copper-modified lead apatite compound, hit at the end of July; now, barely three weeks later, comes news that not only is LK-99 not a superconductor, but that its resistivity at room temperature is about a billion times higher than copper. For anyone who rode the “cold fusion” hype train back in the late 1980s, LK-99 had a bit of code smell on it from the start. We figured we’d sit back and let science do what science does, and sure enough, the extraordinary claim seems not to be able to muster the kind of extraordinary evidence it needs to support it — with the significant caveat that a lot of the debunking papers –and indeed the original paper on LK-99 — seem still to be just preprints, and have not been peer-reviewed yet.

So what does all this mean? Sadly, probably not much. Despite the overwrought popular media coverage, a true room-temperature and pressure superconductor was probably not going to save the world, at least not right away. The indispensable Asianometry channel on YouTube did a great video on this. As always, his focus is on the semiconductor industry, so his analysis has to be viewed through that lens. He argues that room-temperature superconductors wouldn’t make much difference in semiconductors because the place where they’d most likely be employed, the interconnects on chips, will still have inductance and capacitance even if their resistance is zero. That doesn’t mean room-temperature superconductors wouldn’t be a great thing to have, of course; seems like they’d be revolutionary for power transmission if nothing else. But not so much for semiconductors, and certainly not today.

Continue reading “Hackaday Links: August 13, 2023”

The AI Engine That Fits In 100K

Running your own AI models is possible, but it requires a giant computer, right? Maybe not. Researchers at NVidia are showing off Perfusion, a text-to-image model they say is 100KB in size and takes four minutes to train. The model specializes in customizing a photo. For example, the paper shows a picture of a teddy bear and a prompt to dress it as a wizard. In all fairness, the small size and quick training are a little misleading, we think, because the results are still using the usual giant model. What’s small and fast is the customization of the existing model.

Customizing models is a common task since you often want to work with something the model doesn’t contain. For example, you might want to alter a picture of your face or your pet, which probably isn’t in the original model. You can create a special keyword and partially train the model for what you want using something called textual inversion. The problem the researchers identified is that creating textual inversions often causes the new training to leak to unintended areas.

They describe “key locking,” a technique to avoid overfitting when fine-tuning an existing model. For example, suppose you want to add a specific dog picture to the model. With typical techniques, a special keyword like dog* will indicate the custom dog image, but the keyword has no connection with generic dogs, mammals, or animals. This makes it difficult for the AI to work with the image. For example, the prompts “a man sitting” and “a dog sitting” require very different image generations. But if we train a specific dog as “dog*” there’s no deeper understanding that “dog*” is a type of “dog” that the model already knows about. So what do you do with “dog* sitting?” Key locking makes that association.

Continue reading “The AI Engine That Fits In 100K”

The Right Benchmark For GPT

Dan Maloney wanted to design a part for 3D printing. OpenSCAD is a coding language for generating 3D objects. ChatGPT can write code. What could possibly go wrong? You should go read his article because it’s enlightening and hilarious, but the punchline is that it ran afoul of syntax errors, but also gave him enough of a foothold that he could teach himself enough OpenSCAD to get the project done anyway. As with many people who have asked the AI to create some code, Dan finds that it’s not as good as asking someone who knows what they’re doing, but that it’s also better than nothing.

And this is where I start grumbling. When you type your desires into the word-follower machine, your alternative isn’t nothing. Your alternative is to fire up a search engine instead and type “openscad tutorial”. That, for nearly any human endeavor, will get you a few good guides, written by humans who are probably expert in the subject in question, and which are aimed at teaching you the thing that you want to learn. It doesn’t get better than that. You’ll be up and running with your design in no time.

Indeed, if you think about the relevant source material that the LLM was trained on, it’s exactly these tutorials. It can’t possibly do better than the best of them, although the resulting average tutorial might be better than the worst you’ll find. (Some have speculated on what happens when the entire Internet is filled with these generated texts – what will future AIs learn from?)

In Dan’s case, though, he didn’t necessarily want to learn OpenSCAD – he just wanted the latch designed. But in the end, he had to learn enough OpenSCAD to get the AI code compiling without error. He spent an hour learning OpenSCAD and now he’s good to go on his next project too.

So the next time you hear someone say that they got an answer back from a large language model that wasn’t perfect, but it was “better than nothing”, think critically if “nothing” is really the right benchmark.

Do you really want to learn nothing? Do you really have no resources to get started with? I would claim that we have the most amazing set of tutorial resources the world has ever known at our fingertips. Compared to the ability to teach millions of humans to achieve their own goals, that makes the LLM party tricks look kinda weak, in my opinion.

GETMusic Uses Machine Learning To Generate Music, Understands Tracks

Music generation guided by machine learning can make great projects, but there’s not usually much apparent control over the results. The system makes what it makes, and it’s an achievement if the results are not obvious cacophony. But that’s all different with GETMusic which allows for a much more involved approach because it understands and is able to create music by tracks. Among other things, this means one can generate a basic rhythm and melody first, then add additional elements to those existing ones, leaving the previous elements unchanged.

GETMusic can make music from scratch, or guided from examples, and under the hood uses a diffusion-based approach similar to the method behind AI image generators like Stable Diffusion. We’ve previously covered how Stable Diffusion works, but instead of images the same basic principles are used to guide the model from random noise to useful tracks of music.

Just a few years ago we saw a neural network trained to generate Bach, and while it was capable of moments of brilliance, it didn’t produce uniformly-listenable output. GETMusic is on an entirely different level. The model and code are available online and there is a research paper to accompany it.

You can watch a video putting it through its paces just below the page break, and there are more videos on the project summary page.

Continue reading “GETMusic Uses Machine Learning To Generate Music, Understands Tracks”

ChatGPT, The Worst Summer Intern Ever

Back when I used to work in the pharma industry, I had the opportunity to hire summer interns. This was a long time ago, long enough that the fresh-faced college students who applied for the gig are probably now creeping up to retirement age. The idea, as I understood it, was to get someone to help me with my project, which at the time was standing up a distributed data capture system with a large number of nodes all running custom software that I wrote, reporting back to a central server running more of my code. It was more work than I could manage on my own, so management thought they’d take mercy on me and get me some help.

The experience didn’t turn out quite like I expected. The interns were both great kids, very smart, and I learned a lot from them. But two months is a very tight timeframe, and getting them up to speed took up most of that time. Add in the fact that they were expected to do a presentation on their specific project at the end of the summer, and the whole thing ended up being a lot more work for me than if I had just done the whole project myself.

I thought about my brief experience with interns recently with a project I needed a little help on. It’s nothing that hiring anyone would make sense to do, but still, having someone to outsource specific jobs to would be a blessing, especially now that it’s summer and there’s so much else to do. But this is the future, and the expertise and the combined wisdom of the Internet are but a few keystrokes away, right? Well, maybe, but as you’ll see, even the power of large language models has its limit, and trying to loop ChatGPT in as a low-effort summer intern leaves a lot to be desired.

Continue reading “ChatGPT, The Worst Summer Intern Ever”

Text-to-Speech Model Can Do Music, Background Noises, And Sound Effects

Bark is a universal text-to-audio model that can not only create realistic speech, it can incorporate music, background noises, and sound effects. It can even include non-speech sounds like laughter, sighs, throat clearings, and similar elements. But despite the fact that it can deliver such complex results, it’s important to understand some of the peculiarities.

The model takes a prompt and generates the resulting sound from scratch. Results might sometimes be unexpected.

Bark is not a conventional text-to-speech program, and how it works has a lot more in common with large language model AI chatbots. This means that results can deviate from expectations, and outputs aren’t necessarily going to be studio-quality speech. As the project’s README points out, “(generated outputs can) be anything from perfect speech to multiple people arguing at a baseball game recorded with bad microphones.” That being said, there is some support for voice presets as a way to help guide the model with some consistency.

Bark was designed by a company called Suno for research purposes and is available under the MIT License. It can be installed and run locally, and has some demos available as well as an online implementation.

The ability to install and run Bark locally is promising territory for incorporating it into projects. And should you be more interested in speech-to-text instead, don’t forget about this plain C/C++ implementaion of AI-powered speech recognition.

AI Learns To Walk In 3D Training Grounds

AI agents are learning to do all kinds of interesting jobs, even the creative ones that we quite prefer handling ourselves. Nevertheless, technology marches on. Working in this area is YouTuber [AI Warehouse], who has been teaching an AI to walk in a simulated environment.

Albert needed some specific guidance to learn how to walk upright, something that humans tend to figure out innately.

The AI controls a vaguely humanoid-like creature, albeit with a heavily-simplified body and limbs. It “lives” in a 3D environment created in the Unity engine, which provides the necessary physics engine for the work. Meanwhile, the ML-Agents package is used to provide the brain for Albert, the AI charged with learning to walk.

The video steps through a variety of “deep reinforcement learning” tasks. In these, the AI is rewarded for completing goals which are designed to teach it how to walk. Albert is given control of his limbs, and simply charged with reaching a button some distance away on the floor. After many trials, he learns to do the worm, and achieves his goal.

Getting Albert to walk upright took altogether more training. Lumpy ground and walls in between him and his goal were used to up the challenge, as well as encouragements to alternate his use of each foot and to maintain an upright attitude. Over time, he was able to progress through skipping and to something approximating a proper walk cycle.

One may argue that the teaching method required a lot of specific guidance, but it’s still a neat feat to achieve nonetheless. It’s altogether more complex than learning to play Trackmania, we’d say, and that was impressive enough in itself. Video after the break.

Continue reading “AI Learns To Walk In 3D Training Grounds”