ollama¶

Helm chart for deploying Ollama model server.

Installing the Chart¶

To install the chart, run the following:

cd GenAIInfra/helm-charts/common
export MODELNAME="llama3.2"

helm install ollama-release ollama --set OLLAMA_MODEL=${MODELNAME}

By default, the ollama container will download the “llama3.2:1b” model, which is about 1.3GB.

Verify¶

To verify the installation, run the command kubectl get pod to make sure all pods are running.

Then run the command kubectl port-forward svc/ollama-release 11434:80 to expose the ollama service for access.

Open another terminal and run the following command to verify the service is working:

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2:1b",
  "prompt": "What is Deep Learning?",
  "options": {
    "num_predict": 40
  }
}'

Values¶

Key	Type	Default	Description
LLM_MODEL_ID	String	`"llama3.2:1b"`	The model ID to use. Must be one of the models listed in the Ollama Library
global.modelUseHostPath	String	`""`	Cached models directory on Kubernetes node, service will not download if the model is cached here. The host path “modelUseHostPath” will be mounted to the container as /.ollama directory. Setting this to null/empty will force the container to download the model. May not be set if “global.modelUsePVC” is also set.
global.modelUsePVC	String	`""`	Name of Persistent Volume Claim to use for model cache. The Persistent Volume will be mounted to the container as /.ollama directory. Setting this to null/empty will force the container to download the model. May not be set if “global.modelUseHostPath” is also set.