Getting Started with OPEA¶
In this document, we provide a tailored guide to deploying the ChatQnA application in OPEA GenAI Examples across multiple cloud platforms, including Amazon Web Services (AWS), Google Cloud Platform (GCP), IBM Cloud, Microsoft Azure and Oracle Cloud Infrastructure, enabling you to choose the best fit for your specific needs and requirements. For additional deployment targets, see the ChatQnA.
Understanding OPEA’s Core Components¶
Before moving forward, it’s important to familiarize yourself with two key elements of OPEA: GenAIComps and GenAIExamples.
GenAIComps is a collection of microservice components that form a service-based toolkit. This includes a variety of services such as llm (large language models), embedding, and reranking, among others.
GenAIExamples provides practical and deployable solutions to help users implement these services effectively. Examples include ChatQnA and DocSum, which leverage the microservices for specific applications.
Prerequisites¶
Create and Configure a Virtual Server¶
Step 1: Create Your Virtual Server
Open the AWS console and search for EC2 in the search bar.
Select Launch instance to start creating a virtual server.
Under Name and tags, name your virtual server in the Name field.
Under Quick Start, choose Ubuntu (
ami-id : ami-04dd23e62ed049936
) as the base OS.In Instance type, select an instance for your Intel processor.
Note: We recommend
m7i.4xlarge
or larger instance for an Intel® 4th Gen Xeon© Scalable Processor. For more information on virtual servers on AWS, visit the AWS and Intel page.Create a new key pair for SSH access by naming it, or select an existing key pair from the dropdown list.
Under Network Settings:
Choose an existing security group, or
Select Create security group and enable Allow SSH traffic and Allow HTTP traffic.
In Storage, set the size to 100 GiB.
Select Launch instance to launch your virtual server. A Success banner confirms the launch.
Step 2: Connect and Configure Your Virtual Server
Select Connect, and connect using your preferred connection method.
Search for Security Groups in the search bar and select the security group used when creating the instance.
Select Edit inbound rules on the right side of the window.
To add a rule, select Add rule and enter the following:
Type: Custom TCP
Port Range: 80
Source: 0.0.0.0/0
Note: To learn more, see editing inbound or outbound rules from AWS documentation.
Select Save rules to commit your changes.
Navigate to GCP console – Click the “Create a VM” button.
Provide a name to the VM.
Select the base OS as
Ubuntu 24.04 LTS
from Marketplace .Select an Instance type that is based on Intel hardware.
Note: We recommend selecting a
c4-standard-32
or larger instance with an Intel(R) 4th Gen Xeon(C) Scalable Processor, and the minimum supported c3 instance type is c3-standard-8 with 32GB memory. For more information, visit virtual servers on GCP.
Under Firewall settings select “Allow HTTP traffic” to access ChatQnA UI web portal.
Change the Boot disk to 100 GiB and click “Create”.
Use any preferred SSH method such as “Open in browser window” to connect to the instance.
Navigate to IBM Cloud. - Click the “Create resource” button at the top right of the screen. Select “Compute” from the options available and select “Virtual Server for VPC”
Select a location for the instance. Assign a name to it.
Under Stock Images, select Ubuntu 24.04 (
ibm-ubuntu-24-04-6-minimal-amd64-1
)Select a virtual server.
Note: We recommend selecting a 3-series instance with an Intel(R) 4th Gen Xeon(C) Scalable Processor, such as
bx3d-16x80
or above. For more information on virtual servers on IBM cloud visit Intel® solutions on IBM Cloud®.
Add an SSH key to the instance, if necessary, create one first.
Click on “Create virtual server”.
Once the instance is running, create and attach a “Floating IP” to the instance. For more information visit this site
Under “Infrastructure” in the left pane, go to Network/Security groups/
/Rules Select “Create”
Enable inbound traffic for port 80. For more information on editing inbound/outbound rules, click here
ssh into the instance using the floating IP (
ssh -i <key> ubuntu@<floating-ip>
)
Navigate to Microsoft Azure – Select the “Skip” button on the bottom right to land on the service offerings page. Search for “Virtual Machines” in the search bar and select it. Click the “Create” button and select “Azure Virtual Machine”.
Select an existing “Resource group” from the drop down or click “Create” for a new Resource group and give it a name. If you have issues refer to cannot create resource groups.
Provide a name to the VM and select the base OS as
Ubuntu 24.04 LTS
Select x64 in VM architecture.
Select an Instance type that is based on Intel hardware.
Note: We recommend selecting a
Standard_D16ds_v5
instance or larger with an Intel(R) 3rd/4th Gen Xeon(C) Scalable Processor. You can find this family of instances in the (US) West US Region. Visit for more information virtual machines on Azure.
Select Password as Authentication type and create username and password for your instance.
Choose the Allow selected ports in Inbound port rule section and select HTTP.
Click “Next: Disk” button and select OS disk size as 128GiB.
Click on “Review + Create” to launch the VM.
Click Go to resource -> Connect -> Connect -> SSH using Azure CLI. Accept the terms and then select “Configure + connect”
Note: If you have issues connecting to the instance with SSH, you could instead access the same via the Bastion host with your username and password.
Login to Oracle Cloud Console – Then navigate to Compute Instances. Click the “Create Instance” button.
Provide a name to the VM and select the placement in the availability domains.
In Image and Shape section click “Change Image” > “Ubuntu” and then select
Canonical Ubuntu 24.04
. Submit using the “Select Image” button at the bottom.Click the “Change Shape” > “Bare Metal Machine” then select the
BM.Standard3.64
. Submit using the “Select Shape” button at the bottom.Select the VCN and the public subnet that the server needs to reside in. If a new VCN/Subnet needs to be created then select the “Create new virtual cloud network” and the “Create new public subnet” to create a subnet that is exposed to the internet.
Next, save a private key by or upload an existing public key.
Specify a boot volume size of 100 GiB with 30 VPU units of performance.
Click Create to launch the instance.
Note the public IP address of the machine once its launched.
Once the instance is launched, click on the subnet in the Primary VNIC section. Then click on the “Default Security List for vcn-xxxxxxxx-xxxx” , click on the “Add Ingress Rules”. Add the following information:
Source CIDR: 0.0.0.0/0
Source Port Range: All
Destination Port Range: 80
Click on “Save”
Connect using ssh (
ssh -i <private_key> ubuntu@<public_ip_address>
).
Sign up to create an account or log in to Intel® Tiber™ AI Cloud. Check if you have sufficient cloud credits and purchase or redeem a coupon if needed. Go to the “Compute” tab on the left and click on “Instances”. In the center of the screen, click on the “Launch instance” button.
Select your instance configuration, instance type, and machine image which will be Ubuntu.
Note: It is recommended to use the
VM-SPR-LRG
powered by 4th Generation Intel® Xeon® Scalable processors with 64GB of memory and 64GB of disk or more if you wish to use a CPU to run an 8B-parameter model.
Fill out the rest of the form such as giving your instance a name and answering any additional quesitons.
Add your public key for SSH. You can select a key you have previously uploaded or upload a key. The “Upload Key” button also provides instructions on how to create a new SSH key.
Click “Launch instance” to start your machine.
Go back to the “Compute” tab and under “Instances”, note down the private IP address of your new VM.
If you wish to make the UI accessible to others, proceed to the next step to create a load balancer. Otherwise, skip to Step 10 which will explain how to connect to your VM with port forwarding.
Create a load balancer. This can be found in Compute->Load Balancers. Click on “Launch Load Balancer”. Ignore any messages about signing up for access and close any pop-up windows if any. Fill out the form with the following info:
Name: Name for your load balancer
Source IP: The private IP address of your VM in Step 6
Listener Port: 80
Instance Port: 80
Monitor Type: HTTP
Mode: Round Robin
Instances: Select the name of the VM you created
Note: The port used is 80 because this is the NGINX port for the GenAI Examples.
Click “Launch”.
Go back to Compute->Load Balancers to see your new load balancer. Note down the virtual IP address. This is what you will use to access the UI of your GenAI Example on a web browser.
Connect to your VM using ssh and port forward port 80 if needed (
ssh -i <private_key> -J guest@<proxy_jump_ip_address> -L 80:localhost:80 ubuntu@<private_ip_address_of_vm
). If you are using a load balancer, you do not need to include-L 80:localhost:80
.
Deploy the ChatQnA Solution¶
Use the command below to install docker:
wget https://raw.githubusercontent.com/opea-project/GenAIExamples/refs/heads/main/ChatQnA/docker_compose/install_docker.sh
chmod +x install_docker.sh
./install_docker.sh
Configure Docker to run as a non-root user by following these instructions
Clone the repo. It is recommended to checkout a specific release version (i.e. 1.0, 1.1, 1.2, etc):
export RELEASE_VERSION=<your-release-version>
git clone https://github.com/opea-project/GenAIExamples.git
cd GenAIExamples
git checkout tags/v${RELEASE_VERSION}
Set the required environment variables:
export host_ip="localhost"
export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"
Set up proxies if you are behind a firewall:
export no_proxy=${your_no_proxy},$host_ip
export http_proxy=${your_http_proxy}
export https_proxy=${your_http_proxy}
Set up other specific use-case environment variables in set_env.sh
before running it. For example, this is where you can change the model(s) to run with.
cd ChatQnA/docker_compose/intel/cpu/xeon/
source set_env.sh
Now we can start the services:
docker compose -f compose.yaml up -d
Note: It takes a few minutes for the services to start. Check the logs for the services to ensure that ChatQnA is running before proceeding further.
For example to check the logs for the vllm-service
:
docker logs vllm-service | grep Complete
Proceed further only after the output shows Application startup complete.
as shown:
INFO: Started server process [1]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:80 (Press CTRL+C to quit)
Run docker ps -a
as an additional check to verify that all the services are running as shown. Notice the version of the docker images matches the RELEASE_VERSION you specified.
| CONTAINER ID | IMAGE | COMMAND | CREATED | STATUS | PORTS | NAMES |
|--------------|--------------------------------------------------------|------------------------|--------------|-------------|------------------------------------------------------------------------------------------|------------------------------|
| d992b34fda27 | opea/nginx:1.2 | "/docker-entrypoint.…" | 6 days ago | Up 6 days | 0.0.0.0:80->80/tcp, :::80->80/tcp | chatqna-xeon-nginx-server |
| 2d297d595650 | opea/chatqna-ui:1.2 | "docker-entrypoint.s…" | 6 days ago | Up 6 days | 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp | chatqna-xeon-ui-server |
| 0b9b2be1feef | opea/chatqna-without-rerank:1.2 | "python chatqna.py -…" | 6 days ago | Up 6 days | 0.0.0.0:8888->8888/tcp, :::8888->8888/tcp | chatqna-xeon-backend-server |
| b64ba100723c | opea/dataprep:1.2 | "sh -c 'python $( [ …" | 6 days ago | Up 6 days | 0.0.0.0:6007->5000/tcp, [::]:6007->5000/tcp | dataprep-redis-server |
| a9b93207988d | opea/retriever:1.2 | "python opea_retriev…" | 6 days ago | Up 6 days | 0.0.0.0:7000->7000/tcp, :::7000->7000/tcp | retriever-redis-server |
| 4bf653d83cba | redis/redis-stack:7.2.0-v9 | "/entrypoint.sh" | 6 days ago | Up 6 days | 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp | redis-vector-db |
| b2774012be67 | ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 | "text-embeddings-rou…" | 6 days ago | Up 6 days | 0.0.0.0:6006->80/tcp, [::]:6006->80/tcp | tei-embedding-server |
| 6407712b6f9b | opea/vllm:1.2 | "python3 -m vllm.ent…" | 6 days ago | Up 6 days | 0.0.0.0:9009->80/tcp, [::]:9009->80/tcp | vllm-service |
Interact with ChatQnA¶
You can interact with ChatQnA via a browser interface:
To view the ChatQnA interface, open a browser and navigate to the UI by inserting your public facing IP address in the following: `http://{public_ip}:80’.
Note: For users running on ITAC, open a browser to localhost:80 if you are using port forwarding OR the virtual IP address of your load balancer.
We can go ahead and ask a sample question, say ‘What is OPEA?’.
A snapshot of the interface looks as follows:
Given that any information about OPEA was not in the training data for the model, we see the model hallucinating and coming up with a response. We can upload a document (PDF) with information and observe how the response changes.
Note: this example leverages the OPEA document for its RAG based content. You can download the OPEA document and upload it using the UI.
We observe that the response is relevant and is based on the PDF uploaded. See the ChatQnA to learn how you can customize the example with your own content.
What’s Next¶
Try GenAIExamples in-detail starting with mores example.
Try GenAIComps to build microservices.
Get Involved¶
Have you ideas and skills to build out genAI components, microservices, and solutions? Would you like to be a part of this evolving technology in its early stages? Welcome!
Register for our mailing list:
Subscribe to the working group mailing lists that interest you
Go to the Community Section of the OPEA repo for Contribution Guidelines and step by step instructions.
Attend any of our community events and hackathons. https://wiki.lfaidata.foundation/display/DL/OPEA+Community+Events
Current GenAI Examples
Simple chatbot that uses retrieval augmented generation (RAG) architecture. ChatQnA
Code generation, from enabling non-programmers to generate code to improving productivity with code completion of complex applications. CodeGen
Make your applications more flexible by porting to different languages. CodeTrans
Create summaries of news articles, research papers, technical documents, etc. to streamline content systems. DocSum
Mimic human behavior by iteratively searching, selecting, and synthesizing information across large bodies of content. SearchQnA
Provide critical content to your customers by automatically generating Frequently Asked Questions (FAQ) resources. FaqGen
Provide text descriptions from pictures, enabling your users to inquire directly about products, services, sites, etc. VisualQnA
Reduce language barriers through customizable text translation systems. Translation