Example HybridRAG deployments on an Intel® Gaudi® Platform¶
This example covers the single-node on-premises deployment of the HybridRAG example using OPEA components. There are various ways to enable HybridRAG, but this example will focus on four options available for deploying the HybridRAG pipeline to Intel® Gaudi® AI Accelerators.
Note This example requires access to a properly installed Intel® Gaudi® platform with a functional Docker service configured to use the habanalabs-container-runtime. Please consult the Intel® Gaudi® software Installation Guide for more information.
HybridRAG Quick Start Deployment¶
This section describes how to quickly deploy and test the HybridRAG service manually on an Intel® Gaudi® platform. The basic steps are:
Access the Code¶
Clone the GenAIExample repository and access the HybridRAG Intel® Gaudi® platform Docker Compose files and supporting scripts:
git clone https://github.com/opea-project/GenAIExamples.git
cd GenAIExamples/HybridRAG/docker_compose/intel/hpu/gaudi/
Checkout a released version, such as v1.4:
git checkout v1.4
Generate a HuggingFace Access Token¶
Some HuggingFace resources, such as some models, are only accessible if you have an access token. If you do not already have a HuggingFace access token, you can create one by first creating an account by following the steps provided at HuggingFace and then generating a user access token.
Configure the Deployment Environment¶
To set up environment variables for deploying HybridRAG services, source the setup_env.sh script in this directory:
source ./set_env.sh
Deploy the Services Using Docker Compose¶
To deploy the HybridRAG services, execute the docker compose up
command with the appropriate arguments. For a default deployment, execute:
docker compose up -d
The HybridRAG docker images should automatically be downloaded from the OPEA registry
and deployed on the Intel® Gaudi® Platform:
[+] Running 9/9
✔ Container redis-vector-db Healthy 6.4s
✔ Container vllm-service Started 0.4s
✔ Container tei-embedding-server Started 0.9s
✔ Container neo4j-apoc Healthy 11.4s
✔ Container tei-reranking-server Started 0.8s
✔ Container retriever-redis-server Started 1.0s
✔ Container dataprep-redis-server Started 6.5s
✔ Container text2cypher-gaudi-container Started 12.2s
✔ Container hybridrag-xeon-backend-server Started 12.4s
To rebuild the docker image for the hybridrag-xeon-backend-server container:
cd GenAIExamples/HybridRAG
docker build --no-cache -t opea/hybridrag:latest -f Dockerfile .
Check the Deployment Status¶
After running docker compose, check if all the containers launched via docker compose have started:
docker ps -a
For the default deployment, the following 10 containers should have started:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a9286abd0015 opea/hybridrag:latest "python hybridrag.py" 15 hours ago Up 15 hours 0.0.0.0:8888->8888/tcp, :::8888->8888/tcp hybridrag-xeon-backend-server
8477b154dc72 opea/text2cypher-gaudi:latest "/bin/sh -c 'bash ru…" 15 hours ago Up 15 hours 0.0.0.0:11801->9097/tcp, [::]:11801->9097/tcp text2cypher-gaudi-container
688e01a431fa opea/dataprep:latest "sh -c 'python $( [ …" 15 hours ago Up 15 hours 0.0.0.0:6007->5000/tcp, [::]:6007->5000/tcp dataprep-redis-server
54f574fe54bb opea/retriever:latest "python opea_retriev…" 15 hours ago Up 15 hours 0.0.0.0:7000->7000/tcp, :::7000->7000/tcp retriever-redis-server
5028eb66617c ghcr.io/huggingface/text-embeddings-inference:cpu-1.6 "text-embeddings-rou…" 15 hours ago Up 15 hours 0.0.0.0:8808->80/tcp, [::]:8808->80/tcp tei-reranking-server
a9dbf8a13365 opea/vllm:latest "python3 -m vllm.ent…" 15 hours ago Up 15 hours (healthy) 0.0.0.0:9009->80/tcp, [::]:9009->80/tcp vllm-service
43f44830f47b neo4j:latest "tini -g -- /startup…" 15 hours ago Up 15 hours (healthy) 0.0.0.0:7474->7474/tcp, :::7474->7474/tcp, 7473/tcp, 0.0.0.0:7687->7687/tcp, :::7687->7687/tcp neo4j-apoc
867feabb6f11 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 15 hours ago Up 15 hours (healthy) 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp redis-vector-db
23cd7f16453b ghcr.io/huggingface/text-embeddings-inference:cpu-1.6 "text-embeddings-rou…" 15 hours ago Up 15 hours 0.0.0.0:6006->80/tcp, [::]:6006->80/tcp tei-embedding-server
Test the Pipeline¶
Once the HybridRAG services are running, run data ingestion. The following command is ingesting unstructure data:
cd GenAIExamples/HybridRAG/tests
curl -X POST -H "Content-Type: multipart/form-data" \
-F "files=@./Diabetes.txt" \
-F "files=@./Acne_Vulgaris.txt" \
-F "chunk_size=300" \
-F "chunk_overlap=20" \
http://${host_ip}:6007/v1/dataprep/ingest
The data files (Diabetes.txt and Acne_Vulgaris.txt) are samples downloaded from Wikipedia, and they are here to facilitate the pipeline tests. Users are encouraged to download their own datasets, and the command above should be updated with the proper file names.
As for the structured data, the application is pre-seeded with structured data and schema by default. To create a knowledge graph with custom data and schema, set the cypher_insert environment variable prior to application deployment.
export cypher_insert='
LOAD CSV WITH HEADERS FROM "https://docs.google.com/spreadsheets/d/e/2PACX-1vQCEUxVlMZwwI2sn2T1aulBrRzJYVpsM9no8AEsYOOklCDTljoUIBHItGnqmAez62wwLpbvKMr7YoHI/pub?gid=0&single=true&output=csv" AS rows
MERGE (d:disease {name:rows.Disease})
MERGE (dt:diet {name:rows.Diet})
MERGE (d)-[:HOME_REMEDY]->(dt)
MERGE (m:medication {name:rows.Medication})
MERGE (d)-[:TREATMENT]->(m)
MERGE (s:symptoms {name:rows.Symptom})
MERGE (d)-[:MANIFESTATION]->(s)
MERGE (p:precaution {name:rows.Precaution})
MERGE (d)-[:PREVENTION]->(p)
'
If the graph database is already populated, you can skip the knowledge graph generation by setting the refresh_db environment variable:
export refresh_db='False'
Now test the pipeline using the following command:
curl -s -X POST -d '{"messages": "what are the symptoms for Diabetes?"}' \
-H 'Content-Type: application/json' \
"${host_ip}:8888/v1/hybridrag"
To collect per request latency for the pipeline, run the following:
curl -o /dev/null -s -w "Total Time: %{time_total}s\n" \
-X POST \
-d '{"messages": "what are the symptoms for Diabetes?"}' \
-H 'Content-Type: application/json' \
"${host_ip}:8888/v1/hybridrag"
Note The value of host_ip was set using the set_env.sh script and can be found in the .env file.
Cleanup the Deployment¶
To stop the containers associated with the deployment, execute the following command:
docker compose -f compose.yaml down
All the HybridRAG containers will be stopped and then removed on completion of the “down” command.