One-Click Deployment for GenAI Examples¶
This document provides a comprehensive guide to deploying, managing, and testing the GenAI Examples using the unified one-click interactive Python script. This script simplifies the entire workflow, from environment validation to service deployment on both Docker and Kubernetes.
Prerequisites¶
Before you begin, ensure your system meets the following requirements.
1. System Requirements¶
Hardware requirements can vary significantly depending on the example and models being used. The following are general guidelines, with more demanding examples like ChatQnA requiring more resources.
CPU: For optimal performance with larger models in production (e.g., on Xeon), use CPUs with more cores.
Memory: A minimum of 64GB RAM. For larger models, 128GB or more is recommended.
HPU (for Gaudi deployments): At least 2 HPU cards are recommended for ChatQnA deployment.
Disk Space: A minimum of 50GB of free disk space is required for Docker images, models, and data.
2. Software Requirements¶
The deployment script and its underlying tools require the following software to be installed:
Python: Python 3.9+ is required.
Python Packages: The script depends on several Python packages. Install them using the provided
requirements.txt
file.cd one_click_deploy pip install -r requirements.txt
Docker Engine: Required for Docker-based deployments and for building container images.
Install Docker by following the official documentation.
Ensure the Docker daemon is running.
Docker Compose: The script uses the
docker compose
command. This is typically included with modern Docker installations or can be installed as a plugin.Kubernetes Tools (for K8s deployments):
kubectl
: The Kubernetes command-line tool.helm
: The package manager for Kubernetes.
Git: Required for cloning repositories during the image build process for some examples.
[!IMPORTANT] The
check_env.sh
script, which can be run as part of the one-click deployment, may requiresudo
privileges to perform actions like installing missing packages or configuring the CPU governor.
3. Environment Requirements¶
Hugging Face Hub Token: A Hugging Face token is required to download models from the Hub.
You can create a token from your Hugging Face account settings.
The script will prompt you for this token and can read it from the default cache location (
~/.cache/huggingface/token
) if available.
Network & Proxy Settings: If you are behind a corporate firewall, you will need to provide proxy settings. The script will interactively ask for:
HTTP_PROXY
HTTPS_PROXY
NO_PROXY
(The script will automatically add the host IP and localhost to this list).
Getting Started: The One-Click Script¶
The one_click_deploy.py
script is the central entry point for all management tasks. It provides an interactive command-line interface to guide you through deployment, testing, and cleanup.
Running the Script¶
To start the interactive deployment process, run the following command from the one_click_deploy
directory:
python3 one_click_deploy.py
The script will present you with a series of choices to configure your deployment.
Interactive Walkthrough: Deploying an Example¶
This section walks you through a typical deployment session for the ChatQnA
example using Docker.
Launch the script:
python3 one_click_deploy.py
Choose an Example: The script will list all available examples from the configuration.
Please choose an example to manage: [1] ChatQnA [2] CodeTrans [3] DocSum [4] CodeGen [5] AudioQnA [1-5] (1): 1
Choose an Action: Select
Deploy
to start the installation process.Please choose an action: [1] Deploy [2] Clear [3] Test Connection [1-3] (1): 1
Configure Deployment: The script will now ask for deployment-specific parameters.
Deployment Mode [docker/k8s] (docker): docker Target Device [xeon/gaudi] (xeon): xeon Hugging Face Token (cached found): **************** HTTP Proxy []: http://your-proxy.com:8080 HTTPS Proxy []: http://your-proxy.com:8080 No Proxy hosts [localhost,127.0.0.1,10.0.1.5]:
Configure Example Parameters: Provide the model IDs and other parameters specific to the chosen example. Defaults are provided.
LLM Model ID (e.g., meta-llama/Meta-Llama-3-8B-Instruct) [meta-llama/Meta-Llama-3-8B-Instruct]: Embedding Model ID (e.g., BAAI/bge-base-en-v1.5) [BAAI/bge-base-en-v1.5]: Reranking Model ID (e.g., BAAI/bge-reranker-base) [BAAI/bge-reranker-base]: Data Mount Directory (for Docker) [./data]:
Select Optional Steps: Choose whether to run pre-flight checks, build images, or run post-deployment tests.
Run environment check? [y/N]: y Update images (build/push)?: n Run connection tests after deployment? [y/N]: y
Confirm and Deploy: Review the summary and confirm to start the deployment.
====================================================================== == CONFIGURATION SUMMARY == ====================================================================== 📘 Deploy Mode: docker 📘 Target Device: xeon ... Proceed with deployment? [Y/n]: y ``` The script will now execute all the selected steps: check the environment, configure services, and deploy using Docker Compose.
Testing a Deployed Service¶
You can test a running deployment at any time by selecting the Test Connection
action.
Choose the
Test Connection
action.Specify how the service was deployed (
docker
ork8s
) and on which device.If testing a Kubernetes service, provide a local port for port-forwarding (e.g.,
8080
).The script will then establish a connection and run a pre-defined test against the main service endpoint, reporting whether it passed or failed.
# Example test run for a Docker deployment
$ python3 one_click_deploy.py
# ... choose example and 'Test Connection' action ...
Deployment Mode [docker/k8s] (docker): docker
Target Device [xeon/gaudi] (xeon): xeon
...
======================================================================
== TESTING CONNECTION FOR CHATQNA ==
======================================================================
📘 [INFO] Testing POST http://10.0.1.5:8888/v1/chatqna
✅ [OK] Test '/v1/chatqna' PASSED.
📘 [INFO] Test Summary: Passed: 1, Failed: 0, Skipped: 0
Clearing a Deployment¶
To stop and remove a deployed example, use the Clear
action. This is crucial for releasing resources and avoiding conflicts.
Choose the
Clear
action.Select the deployment mode (
docker
ork8s
) that you want to clear.If clearing a Docker deployment, specify the device it was deployed on.
If clearing a Kubernetes deployment, you will be asked if you also want to delete the entire namespace.
The script will then run the appropriate commands (docker compose down -v
or helm uninstall
) to tear down the services.
Verifying the Deployment¶
After the script finishes, you can manually verify that all services are running correctly.
For Docker Compose deployments:
Run docker ps
to see all running containers. All service containers for the example should be in the Up
state.
docker ps
To check the logs of a specific service (e.g., the backend):
# Find the container name with 'docker ps' first
docker logs chatqna-xeon-backend-server
For Kubernetes deployments:
Run kubectl get pods -n <namespace>
to verify that all pods are in the Running
or Completed
state. The default namespace is typically the name of the example (e.g., chatqna
).
# Check pods and services in the 'chatqna' namespace
kubectl get pods -n chatqna
kubectl get svc -n chatqna
To check the logs of a specific pod:
# Find the pod name with 'kubectl get pods' first
kubectl logs chatqna-backend-server-svc-xxxxxxxx-yyyyy -n chatqna
Troubleshooting¶
If you encounter issues, refer to the following common problems and solutions.
Environment Check Fails:
Problem: The script reports that required commands like
docker
orhelm
are missing.Solution: Manually install the missing software using your system’s package manager. Review the output of the environment check and the log file (
one_click_deploy/deployment.log
) for details.
Docker Image Pull Errors:
Problem: Docker fails to pull images, often with an
authentication required
ortimeout
error.Solution:
Check HF Token: Ensure the Hugging Face token you provided is correct and has the necessary permissions.
Check Proxy Settings: If you are behind a firewall, ensure your
http_proxy
andhttps_proxy
settings are correct.Check Disk Space: A
no space left on device
error indicates you need to free up disk space.
Kubernetes Pods Stuck in
Pending
orImagePullBackOff
:Problem: Pods fail to start.
Solution:
Use
kubectl describe pod <pod-name> -n <namespace>
to get detailed events.ImagePullBackOff
: This often means the Kubernetes cluster cannot access the container image. Check that the image registry and tag are correct and that your cluster has credentials to pull from it (the Helm chart should handle this if using a public registry, but the HF token is still crucial for models).Pending
: This can be due to insufficient resources (CPU/memory/HPU) on your cluster nodes. Check the pod description for messages about resource constraints.
Connection Test Fails:
Problem: The script reports that it cannot connect to the service endpoint.
Solution:
Verify that all containers/pods are running correctly using the steps in the Verifying the Deployment section.
Check the logs of the main service container/pod for any startup errors.
Ensure no firewalls are blocking the connection between your machine and the service port (for Docker) or the
kubectl port-forward
connection (for K8s).