An LLM For The Raspberry Pi

May 10, 2025

Microsoft’s latest Phi4 LLM has 14 billion parameters that require about 11 GB of storage. Can you run it on a Raspberry Pi? Get serious. However, the Phi4-mini-reasoning model is a cut-down version with “only” 3.8 billion parameters that requires 3.2 GB. That’s more realistic and, in a recent video, [Gary Explains] tells you how to add this LLM to your Raspberry Pi arsenal.

The version [Gary] uses has four-bit quantization and, as you might expect, the performance isn’t going to be stellar. If you are versed in all the LLM lingo, the quantization is the way weights are stored, and, in general, the more parameters a model uses, the more things it can figure out.

As a benchmark, [Gary] likes to use what he calls “the Alice question.” In other words, he asks for an answer to this question: “Alice has five brothers and she also has three sisters. How many sisters does Alice’s brother have?” While it probably took you a second to think about it, you almost certainly came up with the correct answer. With this model, a Raspberry Pi can answer it, too.

The first run seems fairly speedy, but it is running on a PC with a GPU. He notes that the same question takes about 10 minutes to pop up on a Raspberry Pi 5 with 4 cores and 8GB of RAM.

We aren’t sure what you’d do with a very slow LLM, but it does work. Let us know what you’d use it for, if anything, in the comments.

There are some other small models if you don’t like Phi4.

27 thoughts on “An LLM For The Raspberry Pi”

Experienced Experimenter says:

May 10, 2025 at 7:38 pm

There are more choices available for limited memory systems. Qwen3 8B is remarkably competitive to that 14B and to the 70Bs for its size.

Report comment

Reply
1. Timo says:
  
  May 11, 2025 at 12:54 pm
  
  There is also a 0.6B model that’s really quite surpring!
  
  Report comment
  
  Reply
rtyrtyhrt says:

May 10, 2025 at 11:18 pm

Can I run any model on my computer localy?
for example https://huggingface.co/models

Report comment

Reply
1. tom says:
  
  May 11, 2025 at 5:10 am
  
  Yeah, check out LMstudio.
  You can use any model that’ll fit into your RAM or VRAM, but if you want anything approaching “real time”, then it really needs to be VRAM. A 4060Ti 16GB is probably the best current gen mid-low end option.
  
  You might be able to get Intel or AMD cards working but it may be a nightmare. CUDA is very polished: anything GTX900 series or later will run “straight out of the box”, I’ve even run stablediffusion on a 650GT without issue.
  
  Arm based Macs are also very popular for running LLMs since they have a unified pool of RAM which extremely high speed and low latency, while (for the higher RAM specs) costing around 10x less than a GPU with the same amount.
  
  Report comment
  
  Reply
2. Al Williams says:
  
  May 11, 2025 at 6:53 am
  
  https://hackaday.com/2025/01/08/running-ai-locally-without-spending-all-day-on-setup/ (lots of other choices in the comments of that post, too).
  
  Report comment
  
  Reply
Alun Morris says:

May 11, 2025 at 1:04 am

To my surprise both chatCPT and gemini get the Alice question wrong and answer 3. Perplexity says 4.

Report comment

Reply
1. Another guy says:
  
  May 11, 2025 at 2:30 am
  
  This shouldn’t be surprising. AI models are physically incapable of performing mathematical operations, they only produce a sentence by selecting the next most-likely word.
  
  If it produces the right answer, it’s chance, and you could fairly easily “persuade” it to output the right answer – or any arbitrary answer for that matter.
  
  Report comment
  
  Reply
  1. Ostracus says:
    
    May 11, 2025 at 5:02 am
    
    Hence the wolfram plugin for ChatGPT.
    
    Report comment
    
    Reply
Paul G says:

May 11, 2025 at 2:49 am

I’d use it to accompany me watching University Challenge, no matter how badly I do, I’d probably win.

Report comment

Reply
macsimki says:

May 11, 2025 at 5:02 am

were are we in the bell curve regarding this ai/llm thing?
first search engines gave correct answers to a wrong understood question, now we have a system giving incorrect answers to perfectly understood questions.

Report comment

Reply
pwdaugy says:

May 11, 2025 at 5:41 am

Here is something interesting:

I just asked Copilot the same question it responded three, when I pointed out the answer is four, because Alice is also his sister, it asserted I was correct. Asking ChatGPT, it took slightly longer to analyze the question but it came back with the correct answer.

Report comment

Reply
Rog77 says:

May 11, 2025 at 6:17 am

There is am ARM port of MS Bitnet, and some of those models would entirely in RAM on a Pi.

Report comment

Reply
SETH says:

May 11, 2025 at 6:37 am

Fascinating. The electricity required, 10 minutes to generate a response, is a good illustration of the resources LLM’s require. Id love to see the math, but I suspect Google and OpenAI are hemorraging cash to keep this train going.

Report comment

Reply
1. djedux says:
  
  May 11, 2025 at 1:47 pm
  
  This has been my thought about AI from the beginning: is it worth the electric power (largely oenerated with air polluting fossil fuels) to get a partially correlt answer a little faster than a reasonably intelligent human can do running on glucose, a very clean fuel?
  
  Report comment
  
  Reply
2. Ostracus says:
  
  May 11, 2025 at 2:38 pm
  
  That’s why people are working on making things more efficient.
  
  https://venturebeat.com/ai/alibabas-zerosearch-lets-ai-learn-to-google-itself-slashing-training-costs-by-88-percent/
  
  Report comment
  
  Reply
Harry says:

May 11, 2025 at 7:54 am

I’ve been running LLM with 12b-14b model with 16GB Pi5 + 1TB NVME. Set Virtual Memory at 16GB for 32GB RAM effective. Latest Ollama even does vision!

Report comment

Reply
Don Mitchinson says:

May 11, 2025 at 7:59 am

What if Alice is Alice Cooper?

Report comment

Reply
1. sweethack says:
  
  May 11, 2025 at 12:04 pm
  
  In that case, he doesn’t have 3 sisters but only one and no brother. Took me 2mn to check wikipedia, faster than the Pi’s working time.
  
  Report comment
  
  Reply
2. Sage says:
  
  May 11, 2025 at 5:18 pm
  
  ‘she also had’
  
  Report comment
  
  Reply
Matthew says:

May 11, 2025 at 12:00 pm

Although I only have a 5800XT, I do have an RTX Super.
Will Phi4 be able to compete with Gemini 2.5 Pro as I have that as my current LLM (paid version so I use it on my Pixel 7a as well).
I’ve seen lots of videos saying Gemini 2.5 pro is heads above everything.
Even going head to head with ChatGPT 4, Gemini 2.5 pro just crushes it on everything.

Report comment

Reply
1. Adam says:
  
  May 12, 2025 at 12:01 am
  
  Phi-4 is smaller, thus worse than Gemini 2.5 Pro. Compare on ArtificialAnalysis.ai
  
  Report comment
  
  Reply
Deividas Strole says:

May 11, 2025 at 3:14 pm

It’s impressive that a Raspberry Pi 5 can even run a 3.8B parameter LLM like Phi-4 mini, even if it takes 10 minutes to answer a simple question. While not practical for real-time use, it’s a great demo of how far edge computing has come—and a fun way to experiment with LLMs on a budget.

Report comment

Reply
1. Adam says:
  
  May 12, 2025 at 12:04 am
  
  RPi 5 is rubbish compared to an average smartphone, which can run a small LLM locally in seconds, not minutes. You can try it in web browser using Candle Phi Wasm, ONNX Runtime Web or MediaPipe.
  
  Report comment
  
  Reply
Ontube says:

May 11, 2025 at 4:40 pm

Just trying with Axelera ai cards. On the Pi

Report comment

Reply
mitchell_cj says:

May 11, 2025 at 8:08 pm

A timely post considering i just today received my order of a couple of OrangePi RV2. RISC-V seems to be the future of vector instructions. I’m not sure what 2 TOPS gets me with little LLM’s but who doesn’t love having some new tech to poke and prod. They even have multiple LLM models available for it.

Report comment

Reply

Hackaday

An LLM For The Raspberry Pi

27 thoughts on “An LLM For The Raspberry Pi”

Leave a Reply to Paul GCancel reply

Search

Never miss a hack

If you missed it

Flow Visualization With Schlieren Photography

Big Chemistry: Cement And Concrete

Optical Contact Bonding: Where The Macro Meets The Molecular

What Happened To WWW.?

Libogc Allegations Rock Wii Homebrew Community

Our Columns

Hackaday Links: May 11, 2025

“Man And Machine” Vs “Man Vs Machine”

Supercon 2024: An Immersive Motion Rehabilitation Device

Hackaday Podcast Episode 320: A Lot Of Cool 3D Printing, DIY Penicillin, And An Optical Twofer

This Week In Security: Encrypted Messaging, NSO’s Judgement, And AI CVE DDoS

27 thoughts on “An LLM For The Raspberry Pi”

Leave a Reply to Paul GCancel reply

Search

Never miss a hack

Subscribe

If you missed it

Our Columns