One curl … | sh. Metal on Apple Silicon, CUDA on NVIDIA. Persistent models in your home.
Lightweight Ollama setup — no Docker, no container overhead. macOS, Linux and Windows. Bundle the installer in your Electron app or use it standalone.
MIT License · Native install via brew / install.sh / Windows installer
A minimal, opinionated Ollama setup — ready for chat and RAG.
Pipe install.sh into sh. Installs Ollama, starts the service, pulls default models. Idempotent — safe to re-run.
Metal on Apple Silicon, CUDA on NVIDIA — auto-detected. 7B–13B models run in seconds, not minutes.
Drop the repo into your Electron app as a git submodule. Scripts have no CWD assumptions and re-run safely on every launch.
From zero to a working LLM in one command.
curl -fsSL https://raw.githubusercontent.com/chevp/nuna-llm-local/main/install.sh | sh
Installs Ollama via brew (macOS) or the official installer (Linux), starts the service, pulls mistral:7b-instruct-q4_K_M + nomic-embed-text.
irm https://raw.githubusercontent.com/chevp/nuna-llm-local/main/install.ps1 | iex
Downloads the Ollama Windows installer, runs it silently, then pulls the same default models. Accept the UAC prompt.
# Interactive shell ollama run mistral:7b-instruct-q4_K_M
# Or hit the HTTP API curl http://localhost:11434/api/generate \ -d '{"model":"mistral:7b-instruct-q4_K_M","prompt":"Hello","stream":false}'
Embedding it in another project?
Add the repo as a git submodule and call the installer from your parent project (e.g. an Electron app's first-launch hook):
git submodule add https://github.com/chevp/nuna-llm-local vendor/nuna-llm-local
./vendor/nuna-llm-local/install.sh
Two tiers: a snappy default set, and a heavyweight set for vision, 3D and shader work.
With Metal (Apple Silicon) or CUDA (NVIDIA) these run in seconds.
| Model | Disk | Use case |
|---|---|---|
| llama3.2:3b | ~2 GB | Fast dev loop, short answers |
| phi3:mini | ~2 GB | Very fast, small reasoning tasks |
| mistral:7b-instruct-q4_K_M default | ~4 GB | Well-rounded chat |
| llama3.1:8b-instruct-q4_K_M | ~5 GB | Slightly higher quality |
| llama3.1:13b-instruct-q4_K_M | ~8 GB | Best quality on 16+ GB unified memory / 12+ GB VRAM |
| nomic-embed-text | ~270 MB | Embeddings (RAG) |
For image input, glTF/JSON-graph reasoning, and GLSL/HLSL/WGSL pipelines. Realistic on Apple Silicon M-Pro/Max with 32 GB+ unified memory, or an NVIDIA GPU with ≥ 16 GB VRAM.
| Model | Disk | Use case |
|---|---|---|
| llama3.2-vision:11b | ~8 GB | Vision: reference images, screenshots, diagrams, UI mockups |
| llava:13b | ~8 GB | Vision: classic baseline, broad ecosystem support |
| qwen2.5-coder:7b | ~5 GB | Lightweight code: fast iteration on shader & pipeline snippets |
| deepseek-coder-v2:16b | ~9 GB | MoE code mid-tier: snappy, strong on graphics & build systems |
| qwen2.5-coder:32b | ~20 GB | Top open coder: GLSL/HLSL/WGSL, Three.js, WebGPU, Vulkan, glTF |
| deepseek-r1:32b | ~20 GB | Reasoning: vertex transforms, BRDF math, lighting derivations |
| mixtral:8x7b | ~26 GB | MoE generalist: broad knowledge, fast for its size |
| llama3.3:70b-instruct-q4_K_M | ~43 GB | Top-tier generalist: architecture & design discussions |
Want a basic RAG stack?
The default install already pulls a chat model and an embedding model. To re-pull or upgrade them:
./scripts/setup-rag.sh
What to expect on each OS.
Metal acceleration via the Apple GPU — auto-detected by Ollama. 7B–13B-q4 models run in seconds.
Best models: 7B-q4 default, 13B-q4 on M-Pro/Max with 32 GB+ memory.
CUDA acceleration via the NVIDIA driver — no toolkit setup. Just install the driver and Ollama auto-detects the GPU.
Best models: any — GPU handles 13B+ easily.
Native Windows installer with NVIDIA GPU support out of the box. Service runs in the background after install.
Best models: CPU-only or NVIDIA, like Linux.
The five things that usually go wrong, and how to fix them.
Another Ollama instance is running — or a leftover Docker container from an older setup. Find and stop it:
# macOS brew services list # Linux systemctl status ollama # Or pick a different bind address launchctl setenv OLLAMA_HOST 127.0.0.1:11435 # macOS
While a query is running, check the processor split:
ollama ps # Should show PROCESSOR=100% GPU. If CPU, you're either on Intel Mac # (no Metal) or running an outdated Ollama: brew upgrade ollama
Verify the NVIDIA driver first — no container toolkit needed:
nvidia-smi
# Then check what Ollama sees:
journalctl -u ollama -n 50 | grep -i -E "cuda|gpu"
# macOS brew services list brew services restart ollama # Linux sudo systemctl status ollama journalctl -u ollama -e
Check disk space — a 7B-q4 model is ~4 GB. Models live in your home (~/.ollama/models/) or under /usr/share/ollama/.ollama/models/ on Linux/systemd.