memory

nuna-llm-local

What Quick Start Models Platforms Troubleshooting Chat Demo →
smart_toy Ollama bolt Metal & CUDA terminal One-liner install

Run an LLM locally.
Native, fast, simple.

One curl … | sh. Metal on Apple Silicon, CUDA on NVIDIA. Persistent models in your home.

Lightweight Ollama setup — no Docker, no container overhead. macOS, Linux and Windows. Bundle the installer in your Electron app or use it standalone.

Try it in 60 seconds View on GitHub

MIT License · Native install via brew / install.sh / Windows installer

What you get

A minimal, opinionated Ollama setup — ready for chat and RAG.

bolt

One command

Pipe install.sh into sh. Installs Ollama, starts the service, pulls default models. Idempotent — safe to re-run.

memory

GPU by default

Metal on Apple Silicon, CUDA on NVIDIA — auto-detected. 7B–13B models run in seconds, not minutes.

extension

Submodule-friendly

Drop the repo into your Electron app as a git submodule. Scripts have no CWD assumptions and re-run safely on every launch.

Quick start

From zero to a working LLM in one command.

1

Install (macOS / Linux)

curl -fsSL https://raw.githubusercontent.com/chevp/nuna-llm-local/main/install.sh | sh

Installs Ollama via brew (macOS) or the official installer (Linux), starts the service, pulls mistral:7b-instruct-q4_K_M + nomic-embed-text.

1b

Install (Windows, PowerShell)

irm https://raw.githubusercontent.com/chevp/nuna-llm-local/main/install.ps1 | iex

Downloads the Ollama Windows installer, runs it silently, then pulls the same default models. Accept the UAC prompt.

2

Talk to it

# Interactive shell
ollama run mistral:7b-instruct-q4_K_M
# Or hit the HTTP API
curl http://localhost:11434/api/generate \
  -d '{"model":"mistral:7b-instruct-q4_K_M","prompt":"Hello","stream":false}'
extension

Embedding it in another project?

Add the repo as a git submodule and call the installer from your parent project (e.g. an Electron app's first-launch hook):

git submodule add https://github.com/chevp/nuna-llm-local vendor/nuna-llm-local
./vendor/nuna-llm-local/install.sh

Recommended models

Two tiers: a snappy default set, and a heavyweight set for vision, 3D and shader work.

Lightweight tier — fast everywhere

With Metal (Apple Silicon) or CUDA (NVIDIA) these run in seconds.

Model Disk Use case
llama3.2:3b ~2 GB Fast dev loop, short answers
phi3:mini ~2 GB Very fast, small reasoning tasks
mistral:7b-instruct-q4_K_M default ~4 GB Well-rounded chat
llama3.1:8b-instruct-q4_K_M ~5 GB Slightly higher quality
llama3.1:13b-instruct-q4_K_M ~8 GB Best quality on 16+ GB unified memory / 12+ GB VRAM
nomic-embed-text ~270 MB Embeddings (RAG)

Heavyweight tier — vision, 3D & shader work

For image input, glTF/JSON-graph reasoning, and GLSL/HLSL/WGSL pipelines. Realistic on Apple Silicon M-Pro/Max with 32 GB+ unified memory, or an NVIDIA GPU with ≥ 16 GB VRAM.

Model Disk Use case
llama3.2-vision:11b ~8 GB Vision: reference images, screenshots, diagrams, UI mockups
llava:13b ~8 GB Vision: classic baseline, broad ecosystem support
qwen2.5-coder:7b ~5 GB Lightweight code: fast iteration on shader & pipeline snippets
deepseek-coder-v2:16b ~9 GB MoE code mid-tier: snappy, strong on graphics & build systems
qwen2.5-coder:32b ~20 GB Top open coder: GLSL/HLSL/WGSL, Three.js, WebGPU, Vulkan, glTF
deepseek-r1:32b ~20 GB Reasoning: vertex transforms, BRDF math, lighting derivations
mixtral:8x7b ~26 GB MoE generalist: broad knowledge, fast for its size
llama3.3:70b-instruct-q4_K_M ~43 GB Top-tier generalist: architecture & design discussions
info

Want a basic RAG stack?

The default install already pulls a chat model and an embedding model. To re-pull or upgrade them:

./scripts/setup-rag.sh

Platform notes

What to expect on each OS.

laptop_mac

macOS (Apple Silicon)

Metal acceleration via the Apple GPU — auto-detected by Ollama. 7B–13B-q4 models run in seconds.

Best models: 7B-q4 default, 13B-q4 on M-Pro/Max with 32 GB+ memory.

dns

Linux

CUDA acceleration via the NVIDIA driver — no toolkit setup. Just install the driver and Ollama auto-detects the GPU.

Best models: any — GPU handles 13B+ easily.

desktop_windows

Windows

Native Windows installer with NVIDIA GPU support out of the box. Service runs in the background after install.

Best models: CPU-only or NVIDIA, like Linux.

Troubleshooting

The five things that usually go wrong, and how to fix them.

Port 11434 already in use add

Another Ollama instance is running — or a leftover Docker container from an older setup. Find and stop it:

# macOS
brew services list
# Linux
systemctl status ollama
# Or pick a different bind address
launchctl setenv OLLAMA_HOST 127.0.0.1:11435   # macOS
GPU not used (slow on Apple Silicon) add

While a query is running, check the processor split:

ollama ps
# Should show PROCESSOR=100% GPU. If CPU, you're either on Intel Mac
# (no Metal) or running an outdated Ollama:
brew upgrade ollama
GPU not detected on Linux add

Verify the NVIDIA driver first — no container toolkit needed:

nvidia-smi
# Then check what Ollama sees:
journalctl -u ollama -n 50 | grep -i -E "cuda|gpu"
Service won't start add
# macOS
brew services list
brew services restart ollama
# Linux
sudo systemctl status ollama
journalctl -u ollama -e
Model download stalls or fails add

Check disk space — a 7B-q4 model is ~4 GB. Models live in your home (~/.ollama/models/) or under /usr/share/ollama/.ollama/models/ on Linux/systemd.