NetBSD Bans AI-Generated Code From Commits

A recent change was announced to the NetBSD commit guidelines which amends these to state that code which was generated by Large Language Models (LLMs) or similar technologies, such as ChatGPT, Microsoft’s Copilot or Meta’s Code Llama is presumed to be tainted code. This amendment was to the existing section about tainted code, which originally referred to any code that was not written directly by the person committing the code, and was due to licensing concerns. The obvious reason behind this is that otherwise code may be copied into the NetBSD codebase which may have been licensed under an incompatible (or proprietary) license.

In the case of LLM-based code generators like the above-mentioned, the problem stems from the fact that they are trained on millions of lines of code from all over the internet, which are naturally released under a wide variety of licenses. Invariably, some of that code will be covered by a license that’s not acceptable for the NetBSD codebase. Although the guideline mentions that these auto-generated code commits may still be admissible, they require written permission from core developers, and presumably an in-depth audit of the code’s heritage. This should leave non-trivial commits that got churned out by ChatGPT and kin out in the cold.

The debate about the validity of works produced by current-gen “artificial intelligence” software is only just beginning, but there’s little question that NetBSD has made the right call here. From a legal and software engineering perspective this policy makes perfect sense, as LLM-generated code simply doesn’t meet the project’s standards. That said, code produced by humans brings with it a whole different set of potential problems.

How AI Large Language Models Work, Explained Without Math

Large Language Models (LLMs ) are everywhere, but how exactly do they work under the hood? [Miguel Grinberg] provides a great explanation of the inner workings of LLMs in simple (but not simplistic) terms that eschews the low-level mathematics of how they work in favor of laying bare what it is they do.

At their heart, LLMs are prediction machines that work on tokens (small groups of letters and punctuation) and are as a result capable of great feats of human-seeming communication. Most technical-minded people understand that LLMs have no idea what they are saying, and this peek at their inner workings will make that abundantly clear.

Be sure to also review an illustrated guide to how image-generating AIs work. And if a peek under the hood of LLMs left you hungry for more low-level details, check out our coverage of training a GPT-2 LLM using pure C code.

Kaffa Roastery founder Svante Hampf shows a bag of their AI-conic coffee blend.

AI-Created Coffee Blend Isn’t Terrible

Weren’t we just talking about coffee-based sacrilege the other day? Here’s something to make the single-origin bean snobs chew their espresso cups: an artisan roastery in Helsinki is offering a coffee blend created by artificial intelligence called AI-conic. The idea, of course, is that technology will lighten the workload needed to produce coffee.

This is an interesting development because Finland consumes the most coffee in the world, according to the International Coffee Organization. Coffee roasting is a highly-valued traditional artisan profession there, so it stands to reason that they might turn to technology for help.

Just like with scotch whisky, there’s nothing wrong with coffee blends outright. Bean blends are good for consistency, when you want every cup to taste pretty much exactly the same. Single-origin beans, though, are traceable to one location, and as a result, they usually have a distinct flavor based on the climate they’re grown in.

If you’re new to coffee, blends are a nice, safe way to start out. And, interestingly, the AI chose to make the blend out of four different types of beans instead of the usual two or three, despite being tasked with creating a blend that would suit the palates of coffee enthusiasts. But the coffee experts agreed that the AI blend was “perfect” and needed no human intervention. We probably won’t be getting to Finland anytime soon, so if you try it, let us know how it tastes!

Do you like cold brew? How would you like to be able to brew some in just three minutes?

Raspberry Pi Narrates (And Tattles On) Your Cat, Nature Documentary Style

Detecting a cat with a raspberry pi and camera is one thing, but [Yoko Li]’s AI Raspberry Pi Cat Detection brings things entirely to another level by narrating your feline’s activities, nature documentary style.

The project is ostensibly aimed at tattling on the housecats by detecting forbidden behavior such as trespassing on the kitchen counter. But we daresay that’s overshadowed by the verbose image analysis, which describes the scene in its best David Attenborough impression.

This feline exemplifies both the beauty and the peaceful nature of its kind. No email will be sent as the cat is not on the kitchen counter.

Hard to believe that just a few years ago this cat detector tool was the bee’s knees in cat detection technology. Things have certainly come a long way. Interested? The GitHub repository has everything needed to roll your own and we highly recommend watching it in action in the video, embedded below.

Continue reading “Raspberry Pi Narrates (And Tattles On) Your Cat, Nature Documentary Style”

AI Helps Make Web Scraping Faster And Easier

Web scraping is usually only a first step towards extracting meaningful data. Once you’ve got everything pulled down, you’ve still got to process it into something useful. Here to assist with that is Scrapegraph-ai, a Python tool that promises to automate the process using a selection of large language models (LLMs).

Scrapegraph-ai is able to accept a URL as well as a prompt, which is a plain-English instruction on what to do with the data. Examples include summarizing, describing images, and more. In other words, gathering the data and analyzing or formatting it can now be done as one.

The project is actually pretty flexible in terms of the AI back-end. It’s able to work with locally-installed AI tools (via ollama) or with API keys for services like OpenAI and more. If you have an OpenAI API key, there’s an online demo that will show you the capabilities pretty effectively. Otherwise, local installation is only a few operations away.

This isn’t the first time we have seen the flexibility of AI tools like large language models leveraged to ease the notoriously-fiddly task of web scraping, and it’s great to see the results have only gotten better.

Mitre Wants The Feds To Play In Its Sandbox

If you haven’t worked with the US government, you might not know Mitre, a non-profit government research organization. Formed in 1958 by the U.S. Air Force as a company to guide the SAGE computer, they are often research experts who oversee government contracts or evaluate proposals. Now they are building a $20 millon “AI Sandbox” for the Federal government to build AI prototypes.

Partnered with NVidia, the sandbox will use an NVidia GDX SuperPOD system capable of an exaFLOP of 8-bit AI computation. Mitre reports this will increase their compute power for AI by two orders of magnitude.

Continue reading “Mitre Wants The Feds To Play In Its Sandbox”

Make 3D Scenes With A Holodeck-Like Voice Interface

The voice interface for the holodeck in Star Trek had users create objects by saying things like “create a table” and “now make it a metal table” and so forth, all with immediate feedback. This kind of interface may have been pure fantasy at the time of airing, but with the advent of AI and LLMs (large language models) this kind of natural language interface is coming together almost by itself.

A fun demonstration of that is [Dominic Pajak]’s demo project called VoxelAstra. This is a WebXR demo that works both in the Meta Quest 3 VR headset (just go to the demo page in the headset’s web browser) as well as on desktop.

The catch is that since the program uses OpenAI APIs on the back end, one must provide a working OpenAI API key. Otherwise, the demo won’t be able to do anything. Providing one’s API key to someone’s web page isn’t terribly good security practice, but there’s also the option of running the demo locally.

Either way, once the demo is up and running the user simply tells the system what to create. Just keep it simple. It’s a fun and educational demo more than anything and will try to do its work with primitive shapes like spheres, cubes, and cylinders. “Build a snowman” is suggested as a good starting point.

Intrigued by what you see and getting ideas of your own? WebXR can be a great way to give those ideas some life and looking at how someone else did something similar is a fine way to begin. Check out another of [Dominic]’s WebXR projects: a simulated BBC Micro, in VR.