Qwen3 30B A3B Hits 13 token/s on 4xRaspberry Pi 5
qwen3_30b.mov
Setup
[🔀 TP-Link LS1008G Switch]
| | | |
| | | |_______ 🔸 raspberrypi2 (ROOT) 10.0.0.2
| | |_________ 🔹 raspberrypi1 (WORKER 1) 10.0.0.1
| |___________ 🔹 raspberrypi3 (WORKER 2) 10.0.0.3
|_____________ 🔹 raspberrypi4 (WORKER 3) 10.0.0.4
Device: 4 x Raspberry Pi 5 8GB
Distributed Llama version: 0.16.0
Model: qwen3_30b_a3b_q40
Benchmark
| Evaluation | Prediction | |
|---|---|---|
| 4 x Raspberry Pi 5 8GB | 14.33 tok/s | 13.04 tok/s |
Please explain me where is Poland as I have 1 year<|im_end|>
<|im_start|>assistant
” –steps 128 –model models/qwen3_30b_a3b_q40/dllama_model_qwen3_30b_a3b_q40.m –tokenizer models/qwen3_30b_a3b_q40/dllama_tokenizer_qwen3_30b_a3b_q40.t –buffer-float-type q80 –nthreads 4 –max-seq-len 4096 –workers 10.0.0.1:9999 10.0.0.3:9999 10.0.0.4:9999
📄 AddBos: 0
📄 BosId: 151643 (<|endoftext|>)
📄 EosId: 151645 (<|im_end|>)
📄 RegularVocabSize: 151643
📄 SpecialVocabSize: 26
Tokenizer vocab size (151669) does not match the model vocab size (151936)
💡 Arch: Qwen3 MoE
💡 HiddenAct: Silu
💡 Dim: 2048
💡 HeadDim: 128
💡 QDim: 4096
💡 KvDim: 512
💡 HiddenDim: 6144
💡 VocabSize: 151936
💡 nLayers: 48
💡 nHeads: 32
💡 nKvHeads: 4
💡 OrigSeqLen: 262144
💡 nExperts: 128
💡 nActiveExperts: 8
💡 MoeHiddenDim: 768
💡 SeqLen: 4096
💡 NormEpsilon: 0.000001
💡 RopeType: Falcon
💡 RopeTheta: 10000000
📀 RequiredMemory: 5513 MB
⭕ Socket[0]: connecting to 10.0.0.1:9999 worker
⭕ Socket[0]: connected
⭕ Socket[1]: connecting to 10.0.0.3:9999 worker
⭕ Socket[1]: connected
⭕ Socket[2]: connecting to 10.0.0.4:9999 worker
⭕ Socket[2]: connected
⭕ Network is initialized
🧠 CPU: neon dotprod fp16
💿 Loading weights…
💿 Weights loaded
🚁 Network is in non-blocking mode
<|im_start|>user
Please explain me where is Poland as I have 1 year<|im_end|>
<|im_start|>assistant
🔷️ Eval 996 ms Sync 330 ms | Sent 12084 kB Recv 20085 kB | (19 tokens)
🔶 Pred 49 ms Sync 37 ms | Sent 636 kB Recv 1057 kB | Of
🔶 Pred 50 ms Sync 94 ms | Sent 636 kB Recv 1057 kB | course
🔶 Pred 60 ms Sync 37 ms | Sent 636 kB Recv 1057 kB | !
🔶 Pred 60 ms Sync 18 ms | Sent 636 kB Recv 1057 kB | Let
🔶 Pred 59 ms Sync 18 ms | Sent 636 kB Recv 1057 kB | me
🔶 Pred 49 ms Sync 27 ms | Sent 636 kB Recv 1057 kB | explain
🔶 Pred 49 ms Sync 18 ms | Sent 636 kB Recv 1057 kB | where
🔶 Pred 49 ms Sync 18 ms | Sent 636 kB Recv 1057 kB | Poland
🔶 Pred 49 ms Sync 18 ms | Sent 636 kB Recv 1057 kB | is
🔶 Pred 49 ms Sync 18 ms | Sent 636 kB Recv 1057 kB | ,
🔶 Pred 53 ms Sync 18 ms | Sent 636 kB Recv 1057 kB | in
…
🔶 Pred 70 ms Sync 15 ms | Sent 636 kB Recv 1057 kB | zech
🔶 Pred 53 ms Sync 24 ms | Sent 636 kB Recv 1057 kB | Republic
🔶 Pred 69 ms Sync 14 ms | Sent 636 kB Recv 1057 kB | **
🔶 Pred 59 ms Sync 16 ms | Sent 636 kB Recv 1057 kB | –
🔶 Pred 55 ms Sync 20 ms | Sent 636 kB Recv 1057 kB | to
🔶 Pred 64 ms Sync 16 ms | Sent 636 kB Recv 1057 kB | the
🔶 Pred 53 ms Sync 36 ms | Sent 636 kB Recv 1057 kB | south
🔶 Pred 62 ms Sync 18 ms | Sent 636 kB Recv 1057 kB |
🔶 Pred 61 ms Sync 16 ms | Sent 636 kB Recv 1057 kB | 3
Evaluation
nBatches: 32
nTokens: 19
tokens/s: 14.33 (69.80 ms/tok)
Prediction
nTokens: 109
tokens/s: 13.04 (76.69 ms/tok)
⭕ Network is closed”>
b4rtaz@raspberrypi2:~/distributed-llama $ ./dllama inference --prompt "<|im_start|>user
Please explain me where is Poland as I have 1 year<|im_end|>
<|im_start|>assistant
" --steps 128 --model models/qwen3_30b_a3b_q40/dllama_model_qwen3_30b_a3b_q40.m --tokenizer models/qwen3_30b_a3b_q40/dllama_tokenizer_qwen3_30b_a3b_q40.t --buffer-float-type q80 --nthreads 4 --max-seq-len 4096 --workers 10.0.0.1:9999 10.0.0.3:9999 10.0.0.4:9999
📄 AddBos: 0
📄 BosId: 151643 (<|endoftext|>)
📄 EosId: 151645 (<|im_end|>)
📄 RegularVocabSize: 151643
📄 SpecialVocabSize: 26
Tokenizer vocab size (151669) does not match the model vocab size (151936)
💡 Arch: Qwen3 MoE
💡 HiddenAct: Silu
💡 Dim: 2048
💡 HeadDim: 128
💡 QDim: 4096
💡 KvDim: 512
💡 HiddenDim: 6144
💡 VocabSize: 151936
💡 nLayers: 48
💡 nHeads: 32
💡 nKvHeads: 4
💡 OrigSeqLen: 262144
💡 nExperts: 128
💡 nActiveExperts: 8
💡 MoeHiddenDim: 768
💡 SeqLen: 4096
💡 NormEpsilon: 0.000001
💡 RopeType: Falcon
💡 RopeTheta: 10000000
📀 RequiredMemory: 5513 MB
⭕ Socket[0]: connecting to 10.0.0.1:9999 worker
⭕ Socket[0]: connected
⭕ Socket[1]: connecting to 10.0.0.3:9999 worker
⭕ Socket[1]: connected
⭕ Socket[2]: connecting to 10.0.0.4:9999 worker
⭕ Socket[2]: connected
⭕ Network is initialized
🧠 CPU: neon dotprod fp16
💿 Loading weights...
💿 Weights loaded
🚁 Network is in non-blocking mode
<|im_start|>user
Please explain me where is Poland as I have 1 year<|im_end|>
<|im_start|>assistant
🔷️ Eval 996 ms Sync 330 ms | Sent 12084 kB Recv 20085 kB | (19 tokens)
🔶 Pred 49 ms Sync 37 ms | Sent 636 kB Recv 1057 kB | Of
🔶 Pred 50 ms Sync 94 ms | Sent 636 kB Recv 1057 kB | course
🔶 Pred 60 ms Sync 37 ms | Sent 636 kB Recv 1057 kB | !
🔶 Pred 60 ms Sync 18 ms | Sent 636 kB Recv 1057 kB | Let
🔶 Pred 59 ms Sync 18 ms | Sent 636 kB Recv 1057 kB | me
🔶 Pred 49 ms Sync 27 ms | Sent 636 kB Recv 1057 kB | explain
🔶 Pred 49 ms Sync 18 ms | Sent 636 kB Recv 1057 kB | where
🔶 Pred 49 ms Sync 18 ms | Sent 636 kB Recv 1057 kB | Poland
🔶 Pred 49 ms Sync 18 ms | Sent 636 kB Recv 1057 kB | is
🔶 Pred 49 ms Sync 18 ms | Sent 636 kB Recv 1057 kB | ,
🔶 Pred 53 ms Sync 18 ms | Sent 636 kB Recv 1057 kB | in
...
🔶 Pred 70 ms Sync 15 ms | Sent 636 kB Recv 1057 kB | zech
🔶 Pred 53 ms Sync 24 ms | Sent 636 kB Recv 1057 kB | Republic
🔶 Pred 69 ms Sync 14 ms | Sent 636 kB Recv 1057 kB | **
🔶 Pred 59 ms Sync 16 ms | Sent 636 kB Recv 1057 kB | –
🔶 Pred 55 ms Sync 20 ms | Sent 636 kB Recv 1057 kB | to
🔶 Pred 64 ms Sync 16 ms | Sent 636 kB Recv 1057 kB | the
🔶 Pred 53 ms Sync 36 ms | Sent 636 kB Recv 1057 kB | south
🔶 Pred 62 ms Sync 18 ms | Sent 636 kB Recv 1057 kB |
🔶 Pred 61 ms Sync 16 ms | Sent 636 kB Recv 1057 kB | 3
Evaluation
nBatches: 32
nTokens: 19
tokens/s: 14.33 (69.80 ms/tok)
Prediction
nTokens: 109
tokens/s: 13.04 (76.69 ms/tok)
⭕ Network is closed
