# Single node on-prem deployment on Gaudi AI Accelerator This section covers single-node on-prem deployment of the CodeGen example. It will show how to deploy an end-to-end CodeGen solution with the `Qwen2.5-Coder-32B-Instruct` model running on Intel® Gaudi® AI Accelerators. To quickly learn about OPEA and set up the required hardware and software, follow the instructions in the [Getting Started](../../../getting-started/README.md) section. ## Overview The CodeGen use case uses a single microservice called LLM with model serving done with vLLM or TGI. This solution is designed to demonstrate the use of the `Qwen2.5-Coder-32B-Instruct` model for code generation on Intel® Gaudi® AI Accelerators. The steps will involve setting up Docker containers, taking text input as the prompt, and generating code. Although multiple versions of the UI can be deployed, this tutorial will focus solely on the default version. ## Prerequisites To run the UI on a web browser external to the host machine such as a laptop, the following port(s) need to be forwarded when using SSH to log in to the host machine: - 7778: CodeGen megaservice port This port is used for `BACKEND_SERVICE_ENDPOINT` defined in the `set_env.sh` for this example inside the `docker compose` folder. Specifically, for CodeGen, append the following to the ssh command: ```bash -L 7778:localhost:7778 ``` Set up a workspace and clone the [GenAIExamples](https://github.com/opea-project/GenAIExamples) GitHub repo. ```bash export WORKSPACE= cd $WORKSPACE git clone https://github.com/opea-project/GenAIExamples.git ``` **Optional** It is recommended to use a stable release version by setting `RELEASE_VERSION` to a **number only** (i.e. 1.0, 1.1, etc) and checkout that version using the tag. Otherwise, by default, the main branch with the latest updates will be used. ```bash export RELEASE_VERSION= # Set desired release version - number only cd GenAIExamples git checkout tags/v${RELEASE_VERSION} cd .. ``` Set up a [HuggingFace](https://huggingface.co/) account and generate a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). The [Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) model does not need special access, but the token can be used with other models requiring access. Set the `HUGGINGFACEHUB_API_TOKEN` environment variable to the value of the Hugging Face token by executing the following command: ```bash export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token" ``` `host_ip` is not required to be set manually. It will be set in the `set_env.sh` script later. For machines behind a firewall, set up the proxy environment variables: ```bash export no_proxy=${your_no_proxy},$host_ip export http_proxy=${your_http_proxy} export https_proxy=${your_http_proxy} ``` ## Use Case Setup CodeGen will utilize the following GenAIComps services and associated tools. The tools and models listed in the table can be configured via environment variables in either the `set_env.sh` script or the `compose.yaml` file. |Use Case Components | Tools | Model | Service Type | |---------------- |--------------|-----------------------------|-------| |LLM | vLLM, TGI | Qwen/Qwen2.5-Coder-32B-Instruct | OPEA Microservice | |UI | | NA | Gateway Service | Set the necessary environment variables to set up the use case. To swap out models, modify `set_env.sh` before running it. For example, the environment variable `LLM_MODEL_ID` can be changed to another model by specifying the HuggingFace model card ID. To run the UI on a web browser on a laptop, modify `BACKEND_SERVICE_ENDPOINT` to use `localhost` or `127.0.0.1` instead of `host_ip` inside `set_env.sh` for the backend to properly receive data from the UI. Run the `set_env.sh` script. ```bash cd $WORKSPACE/GenAIExamples/CodeGen/docker_compose source ./set_env.sh ``` ## Deploy the Use Case Navigate to the `docker compose` directory for this hardware platform. ```bash cd $WORKSPACE/GenAIExamples/CodeGen/docker_compose/intel/hpu/gaudi ``` Run `docker compose` with the provided YAML file to start all the services mentioned above as containers. The vLLM or TGI service can be used for CodeGen. ::::{tab-set} :::{tab-item} vllm :sync: vllm ```bash docker compose --profile codegen-gaudi-vllm up -d ``` ::: :::{tab-item} TGI :sync: TGI ```bash docker compose --profile codegen-gaudi-tgi up -d ``` ::: :::: ### Check Env Variables After running `docker compose`, check for warning messages for environment variables that are **NOT** set. Address them if needed. WARN[0000] The "no_proxy" variable is not set. Defaulting to a blank string. WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. WARN[0000] The "no_proxy" variable is not set. Defaulting to a blank string. WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. WARN[0000] The "no_proxy" variable is not set. Defaulting to a blank string. WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. WARN[0000] The "no_proxy" variable is not set. Defaulting to a blank string. WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. Check if all the containers launched via `docker compose` are running i.e. each container's `STATUS` is `Up` and in some cases `Healthy`. Run this command to see this info: ```bash docker ps -a ``` Sample output: ```bash CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 0040b340a392 opea/codegen-gradio-ui:latest "python codegen_ui_g…" 4 minutes ago Up 3 minutes 0.0.0.0:5173->5173/tcp, [::]:5173->5173/tcp codegen-gaudi-ui-server 3d2c7deacf5b opea/codegen:latest "python codegen.py" 4 minutes ago Up 3 minutes 0.0.0.0:7778->7778/tcp, [::]:7778->7778/tcp codegen-gaudi-backend-server ad59907292fe opea/dataprep:latest "sh -c 'python $( [ …" 4 minutes ago Up 4 minutes (healthy) 0.0.0.0:6007->5000/tcp, [::]:6007->5000/tcp dataprep-redis-server 2cb4e0a6562e opea/retriever:latest "python opea_retriev…" 4 minutes ago Up 4 minutes 0.0.0.0:7000->7000/tcp, [::]:7000->7000/tcp retriever-redis f787f774890b opea/llm-textgen:latest "bash entrypoint.sh" 4 minutes ago Up About a minute 0.0.0.0:9000->9000/tcp, [::]:9000->9000/tcp llm-codegen-vllm-server 5880b86091a5 opea/embedding:latest "sh -c 'python $( [ …" 4 minutes ago Up 4 minutes 0.0.0.0:6000->6000/tcp, [::]:6000->6000/tcp tei-embedding-server cd16e3c72f17 opea/llm-textgen:latest "bash entrypoint.sh" 4 minutes ago Up 4 minutes llm-textgen-server cd412bca7245 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 4 minutes ago Up 4 minutes 0.0.0.0:6379->6379/tcp, [::]:6379->6379/tcp, 0.0.0.0:8001->8001/tcp, [::]:8001->8001/tcp redis-vector-db 8d4e77afc067 opea/vllm:latest "python3 -m vllm.ent…" 4 minutes ago Up 4 minutes (healthy) 0.0.0.0:8028->80/tcp, [::]:8028->80/tcp vllm-server f7c1cb49b96b ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "/bin/sh -c 'apt-get…" 4 minutes ago Up 4 minutes (healthy) 0.0.0.0:8090->80/tcp, [::]:8090->80/tcp tei-embedding-serving ``` Each docker container's log can also be checked using: ```bash docker logs ``` ## Validate Microservices This section will guide through the various methods for interacting with the deployed microservices. ### vLLM or TGI Service ```bash curl http://${host_ip}:8028/v1/chat/completions \ -X POST \ -H 'Content-Type: application/json' \ -d '{"model": "Qwen/Qwen2.5-Coder-32B-Instruct", "messages": [{"role": "user", "content": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}], "max_tokens":32}' ``` Here is sample output: ```bash {"generated_text":"\n\nIO iflow diagram:\n\n!\[IO flow diagram(s)\]\(TodoList.iflow.svg\)\n\n### TDD Kata walkthrough\n\n1. Start with a user story. We will add story tests later. In this case, we'll choose a story about adding a TODO:\n ```ruby\n as a user,\n i want to add a todo,\n so that i can get a todo list.\n\n conformance:\n - a new todo is added to the list\n - if the todo text is empty, raise an exception\n ```\n\n1. Write the first test:\n ```ruby\n feature Testing the addition of a todo to the list\n\n given a todo list empty list\n when a user adds a todo\n the todo should be added to the list\n\n inputs:\n when_values: [[\"A\"]]\n\n output validations:\n - todo_list contains { text:\"A\" }\n ```\n\n1. Write the first step implementation in any programming language you like. In this case, we will choose Ruby:\n ```ruby\n def add_"} ``` ### LLM Microservice ```bash curl http://${host_ip}:9000/v1/chat/completions\ -X POST \ -H 'Content-Type: application/json' \ -d '{"query":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","max_tokens":256,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"stream":true}' ``` The output code is printed one character at a time. It is too long to show here but the last item will be ```bash data: [DONE] ``` ### Dataprep Microservice The following is a template only. Replace the filename placeholders with desired files. ```bash curl http://${host_ip}:6007/v1/dataprep/ingest \ -X POST \ -H "Content-Type: multipart/form-data" \ -F "files=@./file1.pdf" \ -F "files=@./file2.txt" \ -F "index_name=my_API_document" ``` ### CodeGen Megaservice Default: ```bash curl http://${host_ip}:7778/v1/codegen -H "Content-Type: application/json" -d '{ "messages": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception." }' ``` The output code is printed one character at a time. It is too long to show here but the last item will be ```bash data: [DONE] ``` The CodeGen Megaservice can also be utilized with RAG and Agents activated: ```bash curl http://${host_ip}:7778/v1/codegen \ -H "Content-Type: application/json" \ -d '{"agents_flag": "True", "index_name": "my_API_document", "messages": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}' ``` ## Launch UI ### Gradio UI To access the frontend, open the following URL in a web browser: http://${host_ip}:5173. By default, the UI runs on port 5173 internally. A different host port can be used to access the frontend by modifying the port mapping in the `compose.yaml` file as shown below: ```yaml codegen-gaudi-ui-server: image: ${REGISTRY:-opea}/codegen-gradio-ui:${TAG:-latest} ... ports: - "YOUR_HOST_PORT:5173" # Change YOUR_HOST_PORT to the desired port ``` After making this change, restart the containers for the change to take effect. ## Stop the Services Navigate to the `docker compose` directory for this hardware platform. ```bash cd $WORKSPACE/GenAIExamples/CodeGen/docker_compose/intel/hpu/gaudi ``` To stop and remove all the containers, use the commands below: ::::{tab-set} :::{tab-item} vllm :sync: vllm ```bash docker compose --profile codegen-gaudi-vllm down ``` ::: :::{tab-item} TGI :sync: TGI ```bash docker compose --profile codegen-gaudi-tgi down ``` ::: ::::