ollama¶
Helm chart for deploying Ollama model server.
Installing the Chart¶
To install the chart, run the following:
cd GenAIInfra/helm-charts/common
export MODELNAME="llama3.2"
helm install ollama-release ollama --set OLLAMA_MODEL=${MODELNAME}
By default, the ollama container will download the “llama3.2:1b” model, which is about 1.3GB.
Verify¶
To verify the installation, run the command kubectl get pod
to make sure all pods are running.
Then run the command kubectl port-forward svc/ollama-release 11434:80
to expose the ollama service for access.
Open another terminal and run the following command to verify the service is working:
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2:1b",
"prompt": "What is Deep Learning?",
"options": {
"num_predict": 40
}
}'
Values¶
Key |
Type |
Default |
Description |
---|---|---|---|
LLM_MODEL_ID |
String |
|
The model ID to use. Must be one of the models listed in the Ollama Library |
global.modelUseHostPath |
String |
|
Cached models directory on Kubernetes node, service will not download if the model is cached here. The host path “modelUseHostPath” will be mounted to the container as /.ollama directory. Setting this to null/empty will force the container to download the model. May not be set if “global.modelUsePVC” is also set. |
global.modelUsePVC |
String |
|
Name of Persistent Volume Claim to use for model cache. The Persistent Volume will be mounted to the container as /.ollama directory. Setting this to null/empty will force the container to download the model. May not be set if “global.modelUseHostPath” is also set. |