Microsoft’s latest Phi4 LLM has 14 billion parameters that require about 11 GB of storage. Can you run it on a Raspberry Pi? Get serious. However, the Phi4-mini-reasoning model is a cut-down version with “only” 3.8 billion parameters that requires 3.2 GB. That’s more realistic and, in a recent video, [Gary Explains] tells you how to add this LLM to your Raspberry Pi arsenal.
The version [Gary] uses has four-bit quantization and, as you might expect, the performance isn’t going to be stellar. If you are versed in all the LLM lingo, the quantization is the way weights are stored, and, in general, the more parameters a model uses, the more things it can figure out.
As a benchmark, [Gary] likes to use what he calls “the Alice question.” In other words, he asks for an answer to this question: “Alice has five brothers and she also has three sisters. How many sisters does Alice’s brother have?” While it probably took you a second to think about it, you almost certainly came up with the correct answer. With this model, a Raspberry Pi can answer it, too.
The first run seems fairly speedy, but it is running on a PC with a GPU. He notes that the same question takes about 10 minutes to pop up on a Raspberry Pi 5 with 4 cores and 8GB of RAM.
We aren’t sure what you’d do with a very slow LLM, but it does work. Let us know what you’d use it for, if anything, in the comments.
There are some other small models if you don’t like Phi4.
There are more choices available for limited memory systems. Qwen3 8B is remarkably competitive to that 14B and to the 70Bs for its size.
Interestingly, Llama 3.3 70B, is the only one, of those I’ve tried, that answered a few variations of the Alice question correctly first time. GBT could do it, but only after being asked, what about Alice, some failed entirely, even after multiple hints, they just kept insisting on their own correctness.
I see no point in even trying to make LLMs reason correctly. What for? Much better to force them to translate any reasoning problem they face into Prolog or SMT, offload the reasoning to a tool, then translate back the result. Even the smallest LLMs can do it well.
Can I run any model on my computer localy?
for example https://huggingface.co/models
Yeah, check out LMstudio.
You can use any model that’ll fit into your RAM or VRAM, but if you want anything approaching “real time”, then it really needs to be VRAM. A 4060Ti 16GB is probably the best current gen mid-low end option.
You might be able to get Intel or AMD cards working but it may be a nightmare. CUDA is very polished: anything GTX900 series or later will run “straight out of the box”, I’ve even run stablediffusion on a 650GT without issue.
Arm based Macs are also very popular for running LLMs since they have a unified pool of RAM which extremely high speed and low latency, while (for the higher RAM specs) costing around 10x less than a GPU with the same amount.
https://hackaday.com/2025/01/08/running-ai-locally-without-spending-all-day-on-setup/ (lots of other choices in the comments of that post, too).
To my surprise both chatCPT and gemini get the Alice question wrong and answer 3. Perplexity says 4.
This shouldn’t be surprising. AI models are physically incapable of performing mathematical operations, they only produce a sentence by selecting the next most-likely word.
If it produces the right answer, it’s chance, and you could fairly easily “persuade” it to output the right answer – or any arbitrary answer for that matter.
Hence the wolfram plugin for ChatGPT.
I’d use it to accompany me watching University Challenge, no matter how badly I do, I’d probably win.
were are we in the bell curve regarding this ai/llm thing?
first search engines gave correct answers to a wrong understood question, now we have a system giving incorrect answers to perfectly understood questions.
There is am ARM port of MS Bitnet, and some of those models would entirely in RAM on a Pi.
Fascinating. The electricity required, 10 minutes to generate a response, is a good illustration of the resources LLM’s require. Id love to see the math, but I suspect Google and OpenAI are hemorraging cash to keep this train going.