Nanochat Lets You Build Your Own Hackable LLM

Few people know LLMs (Large Language Models) as thoroughly as [Andrej Karpathy], and luckily for us all he expresses that in useful open-source projects. His latest is nanochat, which he bills as a way to create “the best ChatGPT $100 can buy”.

What is it, exactly? nanochat in a minimal and hackable software project — encapsulated in a single speedrun.sh script — for creating a simple ChatGPT clone from scratch, including web interface. The codebase is about 8,000 lines of clean, readable code with minimal dependencies, making every single part of the process accessible to be tampered with.

An accessible, end-to-end codebase for creating a simple ChatGPT clone makes every part of the process hackable.

The $100 is the cost of doing the computational grunt work of creating the model, which takes about 4 hours on a single NVIDIA 8XH100 GPU node. The result is a 1.9 billion parameter micro-model, trained on some 38 billion tokens from an open dataset. This model is, as [Andrej] describes in his announcement on X, a “little ChatGPT clone you can sort of talk to, and which can write stories/poems, answer simple questions.” A walk-through of what that whole process looks like makes it as easy as possible to get started.

Unsurprisingly, a mere $100 doesn’t create a meaningful competitor to modern commercial offerings. However, significant improvements can be had by scaling up the process. A $1,000 version (detailed here) is far more coherent and capable; able to solve simple math or coding problems and take multiple-choice tests.

[Andrej Karpathy]’s work lends itself well to modification and experimentation, and we’re sure this tool will be no exception. His past work includes a method of training a GPT-2 LLM using only pure C code, and years ago we saw his work on a character-based Recurrent Neural Network (mis)used to generate baroque music by cleverly representing MIDI events as text.

11 thoughts on “Nanochat Lets You Build Your Own Hackable LLM

  1. I’ve not looked into LLM too deeply, beside running a few small models offline, and using ChatGPT to solve tricky word puzzles from Blue Prince. These advantages are quite interesting and may be what is needed to create very specific and useful models. In my case, I’d like to put in my own knowledge base and vetted sources on locks, lock picking, and physical security. Possibly in combination with embedded security and electronics, as those are my knowledge fields. It won’t be a model I can share, but it may be able to help me find tricky connections and deep insights.

    As an experiment, I would like to see the LLM trained on just the whole of HackaDay and the projects they use as content. You may not be able to publish the dataset or model, but at least HAD can share how well it works. Similar other institutions could do the same.

  2. I saw this elsewhere yesterday and decided to dig into the source code posted to GitHub. The project is written almost entirely in Python. If this is standard practice for LLM training, it could help to explain, at least in part, the large amount of computing power and ultimately electrical power required for the training process. Python is an interpreted language that can be orders of magnitude less efficient than a compiled language like C. Am I missing something?

    1. Yes, you are. Python is a high level language just used to direct things happening at a lower level. All the computation is happening in compiled libraries written for efficient execution on the target hardware. Those libraries are written in C, or CUDA, or something else as the case & target hardware varies.

  3. Yes. The actual computations is usually done by well optimized machine code, mostly running on GPUs or even ASICs. Python is just the glue that manages and directs the whole show.

Leave a Reply to John Elliot VCancel reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.