Building A Dependency-Free GPT On A Custom OS

The construction of a large language model (LLM) depends on many things: banks of GPUs, vast reams of training data, massive amounts of power, and matrix manipulation libraries like Numpy. For models with lower requirements though, it’s possible to do away with all of that, including the software dependencies. As someone who’d already built a full operating system as a C learning project, [Ethan Zhang] was no stranger to intimidating projects, and as an exercise in minimalism, he decided to build a generative pre-trained transformer (GPT) model in the kernel space of his operating system.

As with a number of other small demonstration LLMs, this was inspired by [Andrej Karpathy]’s MicroGPT, specifically by its lack of external dependencies. The first step was to strip away every unnecessary element from MooseOS, the operating system [Ethan] had previously written, including the GUI, most drivers, and the filesystem. All that’s left is the kernel, and KernelGPT runs on this. To get around the lack of a filesystem, the training data was converted into a header to keep it in memory — at only 32,000 words, this was no problem. Like the original MicroGPT, this is trained on a list of names, and predicts new names. Due to some hardware issues, [Ethan] hasn’t yet been able to test this on a physical computer, but it does work in QEMU.

It’s quite impressive to see such a complex piece of software written solely in C, running directly on hardware; for a project which takes the same starting point and goes in the opposite direction, check out this browser-based implementation of MicroGPT. For more on the math behind GPTs, check out this visualization.

8 thoughts on “Building A Dependency-Free GPT On A Custom OS

  1. I love low-dependency code. I find dependency stacks that reach all the way down like turtles in environments such as NodeJS deeply depressing and demoralizing. They are also difficult to maintain or get working at all. I understand it’s a poor use of time to reinvent your own wheel constantly, but the amount of control and solidity possible by minimizing dependencies is something that needs to better emphasized in the world of software development. I am very excited to see dependency minimization come to LLMs.

  2. The principal use I have for AI is domain specific…all about kefir, for example. I treat AI as more in the way of an irritatingly chatty encyclopedia than a “buddy.” How is a local AI going to help me, regardless how dependency free or custom OS it might be?

    1. Are we debating local vs cloud, or AI vs no AI?

      AI is extremely useful for anything that you used to ask on Reddit or Stack Overflow. I.E. you’re asking a question where you don’t necessarily even know the right vocab words to use, so it might require some back-and-forth. (“I have a leak in my roof deck, where the handrail is screwed to the floor. Is there supposed to be some waterproofing thing there? What is it called?”)

      Local AI is less obviously practical right now, but will become relevant as the major cloud providers pivot to profitability and start to charge & advertise more.

    2. Local AI is about owning it, the same way it is nice owning physical media instead of a streaming subscription.

      Local AI doesn’t have to exist in a bubble, though. Local AI agents are still able to use tools like search engines and pull current, live information.

  3. IMHO, we are approaching inflection point at which for-profit AI would become its own bubble/babble and local AI “agents” with the internet access will be creating their own smaller versions of, unrelated and largely/mostly independent.

    Kind of sort of like in the days of lore there were “major” newspapers, ie, printed and distributed in the large cities and “local” ones, ie, regional/municipal/county ones. While “major” ones were where all the news of the world would be concentrated/known, “local” ones sometimes covered things ignored/overlooked by the “major” ones, like scandals or whatnots. “Local” ones usually operated on far smaller budgets, and lacked access to the same news sources the “major” ones did, which is what’s different today, anyone now can have unlimited/unrestrained access to any information source. (well, sans internet blocking places that already had became their own self-contained bubbles – but even they need unlimited/unrestrained access to SOME information, science, for example).

    Point being local AI unrelated to the for-profit ones, but having access to about the same information for-profit ones do, may end up delivering better results; I’d say let The Competition weed out the dead from the fast (tm “Back to School”) and see where that goes.

    I am a cautious optimist, but insofar I do not see AI in general delivering the results I actually could use – 1 – affordable housing – 2 – proper public transit – 3 – cheaper/better public education (again, not in the US), etc etc – 4 – lower healthcare costs with better modern things (growing new teeth? etc).

    AI-generated Hollywood movies were not hard to tackle to start with, btw, Bollywood did that already, so that doesn’t impress me, but what would impress/entertain me will be pauhlitisians replaced with the AI agents average Sam can summon at his will, and I’ll stop at that, because I can tell that such control WILL be locked away from me the second it becomes available.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.