Building A Dependency-Free GPT On A Custom OS

The construction of a large language model (LLM) depends on many things: banks of GPUs, vast reams of training data, massive amounts of power, and matrix manipulation libraries like Numpy. For models with lower requirements though, it’s possible to do away with all of that, including the software dependencies. As someone who’d already built a full operating system as a C learning project, [Ethan Zhang] was no stranger to intimidating projects, and as an exercise in minimalism, he decided to build a generative pre-trained transformer (GPT) model in the kernel space of his operating system.

As with a number of other small demonstration LLMs, this was inspired by [Andrej Karpathy]’s MicroGPT, specifically by its lack of external dependencies. The first step was to strip away every unnecessary element from MooseOS, the operating system [Ethan] had previously written, including the GUI, most drivers, and the filesystem. All that’s left is the kernel, and KernelGPT runs on this. To get around the lack of a filesystem, the training data was converted into a header to keep it in memory — at only 32,000 words, this was no problem. Like the original MicroGPT, this is trained on a list of names, and predicts new names. Due to some hardware issues, [Ethan] hasn’t yet been able to test this on a physical computer, but it does work in QEMU.

It’s quite impressive to see such a complex piece of software written solely in C, running directly on hardware; for a project which takes the same starting point and goes in the opposite direction, check out this browser-based implementation of MicroGPT. For more on the math behind GPTs, check out this visualization.

Continue reading “Building A Dependency-Free GPT On A Custom OS”

AI Poised To Turn The Internet Into Gibberish

Last Thursday two lowly masters grad students, Aaron Gokaslan and Vanya Cohen managed to replicate the secretive OpenAI model and cheekily named their version OpenGPT-2. The code can be downloaded from this Google Colab page and apparently no prior experience in language modeling is required to use it. More useful might be the skills required to persuade Google to part with $50,000 worth of free cloud compute time for the training!

Research firm OpenAI released a new, ever more powerful, version of their GPT language model with 1.5 billion parameters, trained on a data-set of 8 million web pages and although it’s most entertaining use is to produce gibberish, it will inevitably also be able to produce coherent text sometime very soon. For us mere mortals, there’s a cut down version of the model hosted in the cloud and a webpage that we can visit, type in a short phrase to prompt the system, and print out a few paragraphs of fake news. We tried it with the following: “You can use an Arduino to” and the results were:

  • You can use an Arduino to remove your appendix. Use an open or closed loop, but do not use them at the same time, i.e. don’t wire your Arduino to the wall like I did. The Arduino’s are very noisy and can damage the circuit board. Be sure to use a good quality and high-quality resistor to connect them to the wall. You can also use your own antenna if you’d like.

The neural network works by taking the text that has been previously printed and using this to predict the next word sequentially. Because of it’s obscurity, our Arduino example is a pretty severe and rather unfair test on the system whilst in it’s fledgling infant state and, looking at the proper metrics, it actually performs quite well on certain standard industry test sets such as the Children’s Book Test.

Be sure to paste your own fake news into the comments below and we’ll take a vote on the one that’s most entertaining, but please keep it within the boundaries of good taste!

Whilst this is an emerging technology, somebody did get hold of it a while back and applied it to an old teleprinter!