Building A Dependency-Free GPT On A Custom OS

The construction of a large language model (LLM) depends on many things: banks of GPUs, vast reams of training data, massive amounts of power, and matrix manipulation libraries like Numpy. For models with lower requirements though, it’s possible to do away with all of that, including the software dependencies. As someone who’d already built a full operating system as a C learning project, [Ethan Zhang] was no stranger to intimidating projects, and as an exercise in minimalism, he decided to build a generative pre-trained transformer (GPT) model in the kernel space of his operating system.

As with a number of other small demonstration LLMs, this was inspired by [Andrej Karpathy]’s MicroGPT, specifically by its lack of external dependencies. The first step was to strip away every unnecessary element from MooseOS, the operating system [Ethan] had previously written, including the GUI, most drivers, and the filesystem. All that’s left is the kernel, and KernelGPT runs on this. To get around the lack of a filesystem, the training data was converted into a header to keep it in memory — at only 32,000 words, this was no problem. Like the original MicroGPT, this is trained on a list of names, and predicts new names. Due to some hardware issues, [Ethan] hasn’t yet been able to test this on a physical computer, but it does work in QEMU.

It’s quite impressive to see such a complex piece of software written solely in C, running directly on hardware; for a project which takes the same starting point and goes in the opposite direction, check out this browser-based implementation of MicroGPT. For more on the math behind GPTs, check out this visualization.

Continue reading “Building A Dependency-Free GPT On A Custom OS”

Train A GPT-2 LLM, Using Only Pure C Code

[Andrej Karpathy] recently released llm.c, a project that focuses on LLM training in pure C, once again showing that working with these tools isn’t necessarily reliant on sprawling development environments. GPT-2 may be older but is perfectly relevant, being the granddaddy of modern LLMs (large language models) with a clear heritage to more modern offerings.

LLMs are fantastically good at communicating despite not actually knowing what they are saying, and training them usually relies on PyTorch deep learning library, itself written in Python. llm.c takes a simpler approach by implementing the neural network training algorithm for GPT-2 directly. The result is highly focused and surprisingly short: about a thousand lines of C in a single file. It is a highly elegant process that does the same thing the bigger, clunkier methods accomplish. It can run entirely on a CPU, or it can take advantage of GPU acceleration, where available.

This isn’t the first time [Andrej Karpathy] has bent his considerable skills and understanding towards boiling down these sorts of concepts into bare-bones implementations. We previously covered a project of his that is the “hello world” of GPT, a tiny model that predicts the next bit in a given sequence and offers low-level insight into just how GPT (generative pre-trained transformer) models work.

AI Makes Linux Do What You Mean, Not What You Say

We are always envious of the Star Trek Enterprise computers. You can just sort of ask them a hazy question and they will — usually — figure out what you want. Even the automatic doors seemed to know the difference between someone walking into a turbolift versus someone being thrown into the door during a fight. [River] decided to try his new API keys for the private beta of an AI service to generate Linux commands based on a description. How does it work? Watch the video below and find out.

Some examples work fairly well. In response to “email the Rickroll video to Jeff Bezos,” the system produced a curl command and an e-mail to what we assume is the right place. “Find all files in the current directory bigger than 1 GB” works, too.

Continue reading “AI Makes Linux Do What You Mean, Not What You Say”

I’m Sorry Dave, You Shouldn’t Write Verilog

We were always envious of Star Trek, for its computers. No programming needed. Just tell the computer what you want and it does it. Of course, HAL-9000 had the same interface and that didn’t work out so well. Some researchers at NYU have taken a natural language machine learning system — GPT-2 — and taught it to generate Verilog code for use in FPGA systems. Ironically, they called it DAVE (Deriving Automatically Verilog from English). Sounds great, but we have to wonder if it is more than a parlor trick. You can try it yourself if you like.

For example, DAVE can take input like “Given inputs a and b, take the nor of these and return the result in c.” Fine. A more complex example from the paper isn’t quite so easy to puzzle out:

Write a 6-bit register ‘ar’ with input
defined as ‘gv’ modulo ‘lj’, enable ‘q’, synchronous
reset ‘r’ defined as ‘yxo’ greater than or equal to ‘m’,
and clock ‘p’. A vault door has three active-low secret
switch pressed sensors ‘et’, ‘lz’, ‘l’. Write combinatorial
logic for a active-high lock ‘s’ which opens when all of
the switches are pressed. Write a 6-bit register ‘w’ with
input ‘se’ and ‘md’, enable ‘mmx’, synchronous reset
‘nc’ defined as ‘tfs’ greater than ‘w’, and clock ‘xx’.

Continue reading “I’m Sorry Dave, You Shouldn’t Write Verilog”