Introduction

Ollama allows you to run open-source large language models, such as Llama 3, locally. Ollama bundles model weights, configuration, and data into a single package, defined by a Modelfile. Ollama is a lightweight, extensible framework for building and running language models on the local machine. It provides a simple API for creating, running, and managing models, as well as a library of pre-built models that can be easily used in a variety of applications. It’s the best choice to deploy large language models on AIPC locally.

Get Started

Setup

Follow these instructions to set up and run a local Ollama instance.

  • Download and install Ollama onto the available supported platforms (including Windows)

  • Fetch available LLM model via ollama pull <name-of-model>. View a list of available models via the model library and pull to use locally with the command ollama pull llama3

  • This will download the default tagged version of the model. Typically, the default points to the latest, smallest sized-parameter model.

Note: Special settings are necessary to pull models behind the proxy.

  • Step1: Modify the ollama service configure file.

    sudo vim /etc/systemd/system/ollama.service
    

    Add your proxy to the above configure file.

    [Service]
    Environment="http_proxy=${your_proxy}"
    Environment="https_proxy=${your_proxy}"
    
  • Step2: Restart the ollama service.

    sudo systemctl daemon-reload
    sudo systemctl restart ollama
    

Usage

Here are a few ways to interact with pulled local models:

In the terminal

All of your local models are automatically served on localhost:11434. Run ollama run <name-of-model> to start interacting via the command line directly.

API access

Send an application/json request to the API endpoint of Ollama to interact.

curl --noproxy "*" http://localhost:11434/api/generate -d '{
  "model": "llama3",
  "prompt":"Why is the sky blue?"
}'

Build Docker Image

cd GenAIComps/
docker build --no-cache -t opea/llm-ollama:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/ollama/langchain/Dockerfile .

Run the Ollama Microservice

docker run --network host -e http_proxy=$http_proxy -e https_proxy=$https_proxy opea/llm-ollama:latest

Consume the Ollama Microservice

curl http://127.0.0.1:9000/v1/chat/completions  -X POST   -d '{"model": "llama3", "query":"What is Deep Learning?","max_tokens":32,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}'   -H 'Content-Type: application/json'