Single node on-prem deployment on Gaudi AI Accelerator¶
This section covers single-node on-prem deployment of the CodeGen example. It will show how to deploy an end-to-end CodeGen solution with the Qwen2.5-Coder-32B-Instruct
model running on Intel® Gaudi® AI Accelerators. To quickly learn about OPEA and set up the required hardware and software, follow the instructions in the Getting Started section.
Overview¶
The CodeGen use case uses a single microservice called LLM with model serving done with vLLM or TGI.
This solution is designed to demonstrate the use of the Qwen2.5-Coder-32B-Instruct
model for code generation on Intel® Gaudi® AI Accelerators. The steps will involve setting up Docker containers, taking text input as the prompt, and generating code. Although multiple versions of the UI can be deployed, this tutorial will focus solely on the default version.
Prerequisites¶
To run the UI on a web browser external to the host machine such as a laptop, the following port(s) need to be forwarded when using SSH to log in to the host machine:
7778: CodeGen megaservice port
This port is used for BACKEND_SERVICE_ENDPOINT
defined in the set_env.sh
for this example inside the docker compose
folder. Specifically, for CodeGen, append the following to the ssh command:
-L 7778:localhost:7778
Set up a workspace and clone the GenAIExamples GitHub repo.
export WORKSPACE=<Path>
cd $WORKSPACE
git clone https://github.com/opea-project/GenAIExamples.git
Optional It is recommended to use a stable release version by setting RELEASE_VERSION
to a number only (i.e. 1.0, 1.1, etc) and checkout that version using the tag. Otherwise, by default, the main branch with the latest updates will be used.
export RELEASE_VERSION=<Release_Version> # Set desired release version - number only
cd GenAIExamples
git checkout tags/v${RELEASE_VERSION}
cd ..
Set up a HuggingFace account and generate a user access token. The Qwen2.5-Coder-32B-Instruct model does not need special access, but the token can be used with other models requiring access.
Set the HUGGINGFACEHUB_API_TOKEN
environment variable to the value of the Hugging Face token by executing the following command:
export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"
host_ip
is not required to be set manually. It will be set in the set_env.sh
script later.
For machines behind a firewall, set up the proxy environment variables:
export no_proxy=${your_no_proxy},$host_ip
export http_proxy=${your_http_proxy}
export https_proxy=${your_http_proxy}
Use Case Setup¶
CodeGen will utilize the following GenAIComps services and associated tools. The tools and models listed in the table can be configured via environment variables in either the set_env.sh
script or the compose.yaml
file.
Use Case Components |
Tools |
Model |
Service Type |
---|---|---|---|
LLM |
vLLM, TGI |
Qwen/Qwen2.5-Coder-32B-Instruct |
OPEA Microservice |
UI |
NA |
Gateway Service |
Set the necessary environment variables to set up the use case. To swap out models, modify set_env.sh
before running it. For example, the environment variable LLM_MODEL_ID
can be changed to another model by specifying the HuggingFace model card ID.
To run the UI on a web browser on a laptop, modify BACKEND_SERVICE_ENDPOINT
to use localhost
or 127.0.0.1
instead of host_ip
inside set_env.sh
for the backend to properly receive data from the UI.
Run the set_env.sh
script.
cd $WORKSPACE/GenAIExamples/CodeGen/docker_compose
source ./set_env.sh
Deploy the Use Case¶
Navigate to the docker compose
directory for this hardware platform.
cd $WORKSPACE/GenAIExamples/CodeGen/docker_compose/intel/hpu/gaudi
Run docker compose
with the provided YAML file to start all the services mentioned above as containers. The vLLM or TGI service can be used for CodeGen.
docker compose --profile codegen-gaudi-vllm up -d
docker compose --profile codegen-gaudi-tgi up -d
Check Env Variables¶
After running docker compose
, check for warning messages for environment variables that are NOT set. Address them if needed.
WARN[0000] The "no_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "no_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "no_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "no_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string.
Check if all the containers launched via docker compose
are running i.e. each container’s STATUS
is Up
and in some cases Healthy
.
Run this command to see this info:
docker ps -a
Sample output:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
0040b340a392 opea/codegen-gradio-ui:latest "python codegen_ui_g…" 4 minutes ago Up 3 minutes 0.0.0.0:5173->5173/tcp, [::]:5173->5173/tcp codegen-gaudi-ui-server
3d2c7deacf5b opea/codegen:latest "python codegen.py" 4 minutes ago Up 3 minutes 0.0.0.0:7778->7778/tcp, [::]:7778->7778/tcp codegen-gaudi-backend-server
ad59907292fe opea/dataprep:latest "sh -c 'python $( [ …" 4 minutes ago Up 4 minutes (healthy) 0.0.0.0:6007->5000/tcp, [::]:6007->5000/tcp dataprep-redis-server
2cb4e0a6562e opea/retriever:latest "python opea_retriev…" 4 minutes ago Up 4 minutes 0.0.0.0:7000->7000/tcp, [::]:7000->7000/tcp retriever-redis
f787f774890b opea/llm-textgen:latest "bash entrypoint.sh" 4 minutes ago Up About a minute 0.0.0.0:9000->9000/tcp, [::]:9000->9000/tcp llm-codegen-vllm-server
5880b86091a5 opea/embedding:latest "sh -c 'python $( [ …" 4 minutes ago Up 4 minutes 0.0.0.0:6000->6000/tcp, [::]:6000->6000/tcp tei-embedding-server
cd16e3c72f17 opea/llm-textgen:latest "bash entrypoint.sh" 4 minutes ago Up 4 minutes llm-textgen-server
cd412bca7245 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 4 minutes ago Up 4 minutes 0.0.0.0:6379->6379/tcp, [::]:6379->6379/tcp, 0.0.0.0:8001->8001/tcp, [::]:8001->8001/tcp redis-vector-db
8d4e77afc067 opea/vllm:latest "python3 -m vllm.ent…" 4 minutes ago Up 4 minutes (healthy) 0.0.0.0:8028->80/tcp, [::]:8028->80/tcp vllm-server
f7c1cb49b96b ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "/bin/sh -c 'apt-get…" 4 minutes ago Up 4 minutes (healthy) 0.0.0.0:8090->80/tcp, [::]:8090->80/tcp tei-embedding-serving
Each docker container’s log can also be checked using:
docker logs <CONTAINER_ID OR CONTAINER_NAME>
Validate Microservices¶
This section will guide through the various methods for interacting with the deployed microservices.
vLLM or TGI Service¶
curl http://${host_ip}:8028/v1/chat/completions \
-X POST \
-H 'Content-Type: application/json' \
-d '{"model": "Qwen/Qwen2.5-Coder-32B-Instruct", "messages": [{"role": "user", "content": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}], "max_tokens":32}'
Here is sample output:
{"generated_text":"\n\nIO iflow diagram:\n\n!\[IO flow diagram(s)\]\(TodoList.iflow.svg\)\n\n### TDD Kata walkthrough\n\n1. Start with a user story. We will add story tests later. In this case, we'll choose a story about adding a TODO:\n ```ruby\n as a user,\n i want to add a todo,\n so that i can get a todo list.\n\n conformance:\n - a new todo is added to the list\n - if the todo text is empty, raise an exception\n ```\n\n1. Write the first test:\n ```ruby\n feature Testing the addition of a todo to the list\n\n given a todo list empty list\n when a user adds a todo\n the todo should be added to the list\n\n inputs:\n when_values: [[\"A\"]]\n\n output validations:\n - todo_list contains { text:\"A\" }\n ```\n\n1. Write the first step implementation in any programming language you like. In this case, we will choose Ruby:\n ```ruby\n def add_"}
LLM Microservice¶
curl http://${host_ip}:9000/v1/chat/completions\
-X POST \
-H 'Content-Type: application/json' \
-d '{"query":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","max_tokens":256,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"stream":true}'
The output code is printed one character at a time. It is too long to show here but the last item will be
data: [DONE]
Dataprep Microservice¶
The following is a template only. Replace the filename placeholders with desired files.
curl http://${host_ip}:6007/v1/dataprep/ingest \
-X POST \
-H "Content-Type: multipart/form-data" \
-F "files=@./file1.pdf" \
-F "files=@./file2.txt" \
-F "index_name=my_API_document"
CodeGen Megaservice¶
Default:
curl http://${host_ip}:7778/v1/codegen -H "Content-Type: application/json" -d '{
"messages": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."
}'
The output code is printed one character at a time. It is too long to show here but the last item will be
data: [DONE]
The CodeGen Megaservice can also be utilized with RAG and Agents activated:
curl http://${host_ip}:7778/v1/codegen \
-H "Content-Type: application/json" \
-d '{"agents_flag": "True", "index_name": "my_API_document", "messages": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}'
Launch UI¶
Gradio UI¶
To access the frontend, open the following URL in a web browser: http://${host_ip}:5173. By default, the UI runs on port 5173 internally. A different host port can be used to access the frontend by modifying the port mapping in the compose.yaml
file as shown below:
codegen-gaudi-ui-server:
image: ${REGISTRY:-opea}/codegen-gradio-ui:${TAG:-latest}
...
ports:
- "YOUR_HOST_PORT:5173" # Change YOUR_HOST_PORT to the desired port
After making this change, restart the containers for the change to take effect.
Stop the Services¶
Navigate to the docker compose
directory for this hardware platform.
cd $WORKSPACE/GenAIExamples/CodeGen/docker_compose/intel/hpu/gaudi
To stop and remove all the containers, use the commands below:
docker compose --profile codegen-gaudi-vllm down
docker compose --profile codegen-gaudi-tgi down