Ollama Tips
A few tips for working with local AI using Ollama
Contents
Run local models in Claude Code
Ollama allows you to install and run Claude Code but you will likely need to have a larger context window than Ollama will give it by default on most hosts.
Ollama conservatively sets default context window size depending on the amount of VRAM you have, as an example, if you have 48 GiB of VRAM it gives you a 32k context window.
Ollama suggest 64k for Claude Code so you'll need to bump it up (see Increase Context Window)
Increase Context Window
You can increase the Context Window available to your model but this will increase the amount of VRAM it requires to run.
In practice, if you run out of VRAM Ollama will utilize "CPU" and system memory if your context window/model is too large for the VRAM you have available - this allows it to run but comes with a performance penalty.
Temporarily via the CLI
On Ubuntu:
If you have Ollama running as a service, you'll need to stop it first then start it in the CLI with a context length set:
sudo systemctl stop ollama.service
OLLAMA_CONTEXT_LENGTH=64000 ollama serve
Permanently via a Modelfile
You'll need to create a Modelfile that defines the num_ctx parameter the load it into Ollama.
For example, if I wanted to create a modelfile for qwen3.5 with a 64k context I could run this in the CLI:
cat > Modelfile << 'EOF'
FROM qwen3.5:latest
PARAMETER num_ctx 65536
EOF
Then load the Modelfile:
ollama create qwen3.5-64k -f Modelfile
Now I can run the model:
ollama run qwen3.5-64k