Single node on-prem deployment on Gaudi AI Accelerator¶
This section covers single-node on-prem deployment of the CodeTrans example. It will show how to deploy an end-to-end code translation service with the mistralai/Mistral-7B-Instruct-v0.3
model running on Intel® Gaudi® AI Accelerators. To quickly learn about OPEA and set up the required hardware and software, follow the instructions in the Getting Started section.
Overview¶
The CodeTrans use case uses a single LLM microservice for code translation with model serving done on vLLM or TGI.
This solution is designed to demonstrate the use of the Mistral-7B-Instruct-v0.3
model on the Intel® Gaudi® AI Accelerators to translate code between different programming languages. The steps will involve setting up Docker containers, taking code in one programming language as input, and generating code in another programming language. The solution is deployed with a basic UI accessible through both a direct port and Nginx.
Prerequisites¶
To run the UI on a web browser external to the host machine such as a laptop, the following port(s) need to be port forwarded when using SSH to log in to the host machine:
7777: CodeTrans megaservice port
This port is used for BACKEND_SERVICE_ENDPOINT
defined in the set_env.sh
for this example inside the docker compose
folder. Specifically, for CodeTrans, append the following to the ssh command:
-L 7777:localhost:7777
Set up a workspace and clone the GenAIExamples GitHub repo.
export WORKSPACE=<Path>
cd $WORKSPACE
git clone https://github.com/opea-project/GenAIExamples.git # GenAIExamples
Optional It is recommended to use a stable release version by setting RELEASE_VERSION
to a number only (i.e. 1.0, 1.1, etc) and checkout that version using the tag. Otherwise, by default, the main branch with the latest updates will be used.
export RELEASE_VERSION=<Release_Version> # Set desired release version - number only
cd GenAIExamples
git checkout tags/v${RELEASE_VERSION}
cd ..
The example utilizes model weights from HuggingFace. Set up a HuggingFace account and apply for model access to Mistral-7B-Instruct-v0.3
which is a gated model. To obtain access for using the model, visit the model site and click on Agree and access repository
.
Next, generate a user access token.
Set the HUGGINGFACEHUB_API_TOKEN
environment variable to the value of the Hugging Face token by executing the following command:
export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"
Set the host_ip
environment variable to deploy the microservices on the endpoints enabled with ports:
export host_ip=$(hostname -I | awk '{print $1}')
Set up a desired port for Nginx:
# Example: NGINX_PORT=80
export NGINX_PORT=${your_nginx_port}
For machines behind a firewall, set up the proxy environment variables:
export no_proxy=${your_no_proxy},$host_ip
export http_proxy=${your_http_proxy}
export https_proxy=${your_http_proxy}
Use Case Setup¶
CodeTrans will utilize the following GenAIComps services and associated tools. The tools and models listed in the table can be configured via environment variables in either the set_env.sh
script or the compose.yaml
file.
Use Case Components |
Tools |
Model |
Service Type |
---|---|---|---|
LLM |
vLLM or TGI |
mistralai/Mistral-7B-Instruct-v0.3 |
OPEA Microservice |
UI |
NA |
Gateway Service |
|
Ingress |
Nginx |
NA |
Gateway Service |
Set the necessary environment variables to set up the use case. To swap out models, modify set_env.sh
before running it. For example, the environment variable LLM_MODEL_ID
can be changed to another model by specifying the HuggingFace model card ID.
To run the UI on a web browser on a laptop, modify BACKEND_SERVICE_ENDPOINT
to use localhost
or 127.0.0.1
instead of host_ip
inside set_env.sh
for the backend to properly receive data from the UI.
Run the set_env.sh
script.
cd $WORKSPACE/GenAIExamples/CodeTrans/docker_compose
source ./set_env.sh
Deploy the Use Case¶
Navigate to the docker compose
directory for this hardware platform.
cd $WORKSPACE/GenAIExamples/CodeTrans/docker_compose/intel/hpu/gaudi
Run docker compose
with the provided YAML file to start all the services mentioned above as containers. The vLLM or TGI service can be used for CodeTrans.
docker compose -f compose.yaml up -d
docker compose -f compose_tgi.yaml up -d
Check Env Variables¶
After running docker compose
, check for warning messages for environment variables that are NOT set. Address them if needed.
ubuntu@gaudi-vm:~/GenAIExamples/CodeTrans/docker_compose/intel/hpu/gaudi$ docker compose -f ./compose.yaml up -d
WARN[0000] The "no_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string.
Check if all the containers launched via docker compose
are running i.e. each container’s STATUS
is Up
and in some cases Healthy
.
Run this command to see this info:
docker ps -a
Sample output:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ca0cfb3edce5 opea/nginx:latest "/docker-entrypoint.…" 8 minutes ago Up 6 minutes 0.0.0.0:80->80/tcp, [::]:80->80/tcp codetrans-gaudi-nginx-server
d7ef9da3f7db opea/codetrans-ui:latest "docker-entrypoint.s…" 8 minutes ago Up 6 minutes 0.0.0.0:5173->5173/tcp, [::]:5173->5173/tcp codetrans-gaudi-ui-server
2cfc12e1c8f1 opea/codetrans:latest "python code_transla…" 8 minutes ago Up 6 minutes 0.0.0.0:7777->7777/tcp, [::]:7777->7777/tcp codetrans-gaudi-backend-server
c1db5a49003d opea/llm-textgen:latest "bash entrypoint.sh" 8 minutes ago Up 6 minutes 0.0.0.0:9000->9000/tcp, [::]:9000->9000/tcp codetrans-gaudi-llm-server
450f74cb65a4 opea/vllm:latest "python3 -m vllm.ent…" 8 minutes ago Up 8 minutes (healthy) 0.0.0.0:8008->80/tcp, [::]:8008->80/tcp codetrans-gaudi-vllm-service
Each docker container’s log can also be checked using:
docker logs <CONTAINER_ID OR CONTAINER_NAME>
Validate Microservices¶
This section will guide through the various methods for interacting with the deployed microservices.
vLLM or TGI Service¶
During the initial startup, this service will take a few minutes to download the model files and complete the warm-up process. Once this is finished, the service will be ready for use.
Try the command below to check whether the LLM serving is ready.
# vLLM service
docker logs codetrans-gaudi-vllm-service 2>&1 | grep complete
# If the service is ready, you will get the response like below.
INFO: Application startup complete.
# TGI service
docker logs codetrans-gaudi-tgi-service | grep Connected
# If the service is ready, you will get the response like below.
2024-09-03T02:47:53.402023Z INFO text_generation_router::server: router/src/server.rs:2311: Connected
Then try the cURL
command to verify the vLLM or TGI service:
curl http://${host_ip}:8008/generate \
-X POST \
-d '{"inputs":" ### System: Please translate the following Golang codes into Python codes. ### Original codes: '\'''\'''\''Golang \npackage main\n\nimport \"fmt\"\nfunc main() {\n fmt.Println(\"Hello, World!\");\n '\'''\'''\'' ### Translated codes:","parameters":{"max_new_tokens":17, "do_sample": true}}' \
-H 'Content-Type: application/json'
The vLLM or TGI service generates text for the input prompt. Here is the expected result:
{"generated_text":"'''Python\nprint(\"Hello, World!\")"}
LLM Microservice¶
This service handles the core language model operations. Send a direct request to translate a simple “Hello World” program from Go to Python:
curl http://${host_ip}:9000/v1/chat/completions\
-X POST \
-d '{"query":" ### System: Please translate the following Golang codes into Python codes. ### Original codes: '\'''\'''\''Golang \npackage main\n\nimport \"fmt\"\nfunc main() {\n fmt.Println(\"Hello, World!\");\n '\'''\'''\'' ### Translated codes:"}' \
-H 'Content-Type: application/json'
Sample output:
data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"\n"}],"created":1737123223,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null}
data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"\n"}],"created":1737123223,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null}
data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"``"}],"created":1737123223,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null}
data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"`"}],"created":1737123224,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null}
data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"Py"}],"created":1737123224,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null}
data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"thon"}],"created":1737123224,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null}
data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"\n"}],"created":1737123224,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null}
data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"print"}],"created":1737123224,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null}
data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"(\""}],"created":1737123224,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null}
data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"Hello"}],"created":1737123224,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null}
data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":","}],"created":1737123224,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null}
data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":" World"}],"created":1737123225,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null}
data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"!"}],"created":1737123225,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null}
data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"\")"}],"created":1737123225,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null}
data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"\n"}],"created":1737123225,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null}
data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"``"}],"created":1737123225,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null}
data: {"id":"","choices":[{"finish_reason":"length","index":0,"logprobs":null,"text":"`"}],"created":1737123225,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":{"completion_tokens":17,"prompt_tokens":58,"total_tokens":75,"completion_tokens_details":null,"prompt_tokens_details":null}}
data: [DONE]
CodeTrans MegaService¶
The CodeTrans megaservice orchestrates the entire translation process. Test it with a simple code translation request:
curl http://${host_ip}:7777/v1/codetrans \
-H "Content-Type: application/json" \
-d '{"language_from": "Golang","language_to": "Python","source_code": "package main\n\nimport \"fmt\"\nfunc main() {\n fmt.Println(\"Hello, World!\");\n}"}'
Sample output:
data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"\n"}],"created":1737121307,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null}
data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"\n"}],"created":1737121307,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null}
data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":" "}],"created":1737121307,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null}
data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":" Python"}],"created":1737121307,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null}
data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"\n"}],"created":1737121307,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null}
data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"\n"}],"created":1737121307,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null}
data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":" "}],"created":1737121308,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null}
data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":" print"}],"created":1737121308,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null}
data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"(\""}],"created":1737121308,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null}
data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"Hello"}],"created":1737121308,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null}
data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":","}],"created":1737121308,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null}
data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":" World"}],"created":1737121308,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null}
data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"!"}],"created":1737121308,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null}
data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"\")"}],"created":1737121308,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null}
data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":"\n"}],"created":1737121309,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null}
data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":" "}],"created":1737121309,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null}
data: {"id":"","choices":[{"finish_reason":"","index":0,"logprobs":null,"text":" ```"}],"created":1737121309,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":null}
data: {"id":"","choices":[{"finish_reason":"eos_token","index":0,"logprobs":null,"text":"</s>"}],"created":1737121309,"model":"mistralai/Mistral-7B-Instruct-v0.3","object":"text_completion","system_fingerprint":"2.4.0-sha-0a655a0-intel-cpu","usage":{"completion_tokens":18,"prompt_tokens":74,"total_tokens":92,"completion_tokens_details":null,"prompt_tokens_details":null}}
data: [DONE]
The megaservice streams each segment of the response. Each line contains JSON that includes a text
field. Combining the text
values in order will reconstruct the translated code. In this example, the final code is simply:
print("Hello, World!")
Nginx Service¶
The Nginx service acts as a reverse proxy and load balancer for the application. To verify it is properly routing requests, send the same translation request through Nginx:
curl http://${host_ip}:${NGINX_PORT}/v1/codetrans \
-H "Content-Type: application/json" \
-d '{"language_from": "Golang","language_to": "Python","source_code": "package main\n\nimport \"fmt\"\nfunc main() {\n fmt.Println(\"Hello, World!\");\n}"}'
The expected output is the same as the megaservice output.
Each of these endpoints should return a successful response with the translated Python code. If any of these tests fail, check the corresponding service logs for more details.
Launch UI¶
Basic UI¶
To access the frontend user interface (UI), the primary method is through the Nginx reverse proxy service. Open the following URL in a web browser: http://${host_ip}:${NGINX_PORT}. This provides a stable and secure access point to the UI.
Alternatively, the UI can be accessed directly using its internal port. This method bypasses the Nginx proxy and can be used for testing or troubleshooting purposes. To access the UI directly, open the following URL in a web browser: http://${host_ip}:5173. By default, the UI runs on port 5173. A different host port can be used to access the frontend by modifying the FRONTEND_SERVICE_PORT
environment variable. For reference, the port mapping in the compose.yaml
file is shown below:
codetrans-gaudi-ui-server:
image: ${REGISTRY:-opea}/codetrans-ui:${TAG:-latest}
container_name: codetrans-gaudi-ui-server
depends_on:
- codetrans-gaudi-backend-server
ports:
- "${FRONTEND_SERVICE_PORT:-5173}:5173"
After making this change, restart the containers for the change to take effect.
Stop the Services¶
Navigate to the docker compose
directory for this hardware platform.
cd $WORKSPACE/GenAIExamples/CodeTrans/docker_compose/intel/hpu/gaudi
To stop and remove all the containers, use the commands below:
docker compose -f compose.yaml down
docker compose -f compose_tgi.yaml down