Usage guide for genai-microservices-connector(GMC)

genai-microservices-connector(GMC) can be used to compose and adjust GenAI pipelines dynamically. It can leverage the microservices provided by GenAIComps and external services to compose GenAI pipelines.

Below are sample use cases:

Use GMC to compose a chatQnA Pipeline

A sample for chatQnA can be found at config/samples/ChatQnA/chatQnA_dataprep_xeon.yaml

Deploy chatQnA GMC custom resource

kubectl create ns chatqa
kubectl apply -f $(pwd)/config/samples/ChatQnA/chatQnA_dataprep_xeon.yaml
# To use Gaudi device
#kubectl apply -f $(pwd)/config/samples/ChatQnA/chatQnA_dataprep_gaudi.yaml
# To use Nvidia GPU
#kubectl apply -f $(pwd)/config/samples/ChatQnA/chatQnA_nv.yaml

GMC will reconcile chatQnA custom resource and get all related components/services ready

kubectl get service -n chatqa

Check GMC chatQnA custom resource to get access URL for the pipeline

$kubectl get gmconnectors.gmc.opea.io -n chatqa
NAME     URL                                                      READY     AGE
chatqa   http://router-service.chatqa.svc.cluster.local:8080      10/0/10     3m

the READY 10/0/10 means there are 10(the 2nd 10) services deployed by the GMC and 10(the 1st 10) are ready, so the 10 of 10 means the pipeline is all set. the 0 in the middle means there are no external services used, all the resources are managed by GMC inside the clusters.`

you can get the resources via kubectl commands

$ kubectl get pods -n chatqa
NAME                                            READY   STATUS    RESTARTS   AGE
data-prep-svc-deployment-68f7c5dcb9-8fbh8       1/1     Running   0          2m41s
embedding-svc-deployment-775bd5dc49-j4ltr       1/1     Running   0          2m43s
llm-svc-deployment-59f756fb56-4xckz             1/1     Running   0          2m41s
redis-vector-db-deployment-587844d666-hbchr     1/1     Running   0          2m42s
reranking-svc-deployment-846c89f79f-gv7b9       1/1     Running   0          2m42s
retriever-svc-deployment-5c44f7d46-m4qgq        1/1     Running   0          2m43s
router-service-deployment-7f6c5f4796-tzchw      1/1     Running   0          2m41s
tei-embedding-svc-deployment-54b58d57cb-9mwvk   1/1     Running   0          2m43s
tei-reranking-svc-deployment-54c5dd5795-b6wcb   1/1     Running   0          2m42s
tgi-service-m-deployment-5ff67f4db7-b7ztj       1/1     Running   0          2m41s

you can also get the detailed information of these resource by checking the pipeline’s status, this will list all the configmap, deployment and service and their status as below:

$ kubectl get gmc -n chatqa chatqa -o json | jq '.status.annotations' | yq -P
ConfigMap:v1:data-prep-config:chatqa: provisioned
ConfigMap:v1:embedding-usvc-config:chatqa: provisioned
ConfigMap:v1:llm-uservice-config:chatqa: provisioned
ConfigMap:v1:reranking-usvc-config:chatqa: provisioned
ConfigMap:v1:retriever-usvc-config:chatqa: provisioned
ConfigMap:v1:tei-config:chatqa: provisioned
ConfigMap:v1:teirerank-config:chatqa: provisioned
ConfigMap:v1:tgi-config:chatqa: provisioned
Deployment:apps/v1:data-prep-svc-deployment:chatqa: |
  Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
  Conditions:
    Type: Available
    Status: True
    Reason: MinimumReplicasAvailable
    Message: Deployment has minimum availability.
    Type: Progressing
    Status: True
    Reason: NewReplicaSetAvailable
    Message: ReplicaSet "data-prep-svc-deployment-7c7c648846" has successfully progressed.
Deployment:apps/v1:embedding-svc-deployment:chatqa: |
  Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
  Conditions:
    Type: Available
    Status: True
    Reason: MinimumReplicasAvailable
    Message: Deployment has minimum availability.
    Type: Progressing
    Status: True
    Reason: NewReplicaSetAvailable
    Message: ReplicaSet "embedding-svc-deployment-775bd5dc49" has successfully progressed.
Deployment:apps/v1:llm-svc-deployment:chatqa: |
  Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
  Conditions:
    Type: Available
    Status: True
    Reason: MinimumReplicasAvailable
    Message: Deployment has minimum availability.
    Type: Progressing
    Status: True
    Reason: NewReplicaSetAvailable
    Message: ReplicaSet "llm-svc-deployment-59f756fb56" has successfully progressed.
Deployment:apps/v1:redis-vector-db-deployment:chatqa: |
  Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
  Conditions:
    Type: Available
    Status: True
    Reason: MinimumReplicasAvailable
    Message: Deployment has minimum availability.
    Type: Progressing
    Status: True
    Reason: NewReplicaSetAvailable
    Message: ReplicaSet "redis-vector-db-deployment-587844d666" has successfully progressed.
Deployment:apps/v1:reranking-svc-deployment:chatqa: |
  Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
  Conditions:
    Type: Available
    Status: True
    Reason: MinimumReplicasAvailable
    Message: Deployment has minimum availability.
    Type: Progressing
    Status: True
    Reason: NewReplicaSetAvailable
    Message: ReplicaSet "reranking-svc-deployment-846c89f79f" has successfully progressed.
Deployment:apps/v1:retriever-svc-deployment:chatqa: |
  Replicas: 1 desired | 1 updated | 2 total | 1 available | 1 unavailable
  Conditions:
    Type: Available
    Status: True
    Reason: MinimumReplicasAvailable
    Message: Deployment has minimum availability.
    Type: Progressing
    Status: True
    Reason: ReplicaSetUpdated
    Message: ReplicaSet "retriever-svc-deployment-95b967c9d" is progressing.
Deployment:apps/v1:router-service-deployment:chatqa: |
  Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
  Conditions:
    Type: Available
    Status: True
    Reason: MinimumReplicasAvailable
    Message: Deployment has minimum availability.
    Type: Progressing
    Status: True
    Reason: NewReplicaSetAvailable
    Message: ReplicaSet "router-service-deployment-79f54548f4" has successfully progressed.
Deployment:apps/v1:tei-embedding-svc-deployment:chatqa: |
  Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
  Conditions:
    Type: Available
    Status: True
    Reason: MinimumReplicasAvailable
    Message: Deployment has minimum availability.
    Type: Progressing
    Status: True
    Reason: NewReplicaSetAvailable
    Message: ReplicaSet "tei-embedding-svc-deployment-54b58d57cb" has successfully progressed.
Deployment:apps/v1:tei-reranking-svc-deployment:chatqa: |
  Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
  Conditions:
    Type: Available
    Status: True
    Reason: MinimumReplicasAvailable
    Message: Deployment has minimum availability.
    Type: Progressing
    Status: True
    Reason: NewReplicaSetAvailable
    Message: ReplicaSet "tei-reranking-svc-deployment-54c5dd5795" has successfully progressed.
Deployment:apps/v1:tgi-service-m-deployment:chatqa: |
  Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
  Conditions:
    Type: Available
    Status: True
    Reason: MinimumReplicasAvailable
    Message: Deployment has minimum availability.
    Type: Progressing
    Status: True
    Reason: NewReplicaSetAvailable
    Message: ReplicaSet "tgi-service-m-deployment-5fcff459f5" has successfully progressed.
Service:v1:data-prep-svc:chatqa: http://data-prep-svc.chatqa.svc.cluster.local:6007/v1/dataprep
Service:v1:embedding-svc:chatqa: http://embedding-svc.chatqa.svc.cluster.local:6000/v1/embeddings
Service:v1:llm-svc:chatqa: http://llm-svc.chatqa.svc.cluster.local:9000/v1/chat/completions
Service:v1:redis-vector-db:chatqa: http://redis-vector-db.chatqa.svc.cluster.local:6379
Service:v1:reranking-svc:chatqa: http://reranking-svc.chatqa.svc.cluster.local:8000/v1/reranking
Service:v1:retriever-svc:chatqa: http://retriever-svc.chatqa.svc.cluster.local:7000/v1/retrieval
Service:v1:router-service:chatqa: http://router-service.chatqa.svc.cluster.local:8080
Service:v1:tei-embedding-svc:chatqa: http://tei-embedding-svc.chatqa.svc.cluster.local:80
Service:v1:tei-reranking-svc:chatqa: http://tei-reranking-svc.chatqa.svc.cluster.local:80/rerank
Service:v1:tgi-service-m:chatqa: http://tgi-service-m.chatqa.svc.cluster.local:80/generate

NOTE: if you upgrade from pre 0.9 to 0.9 or later, you might encounter below issue

if the router-service and it’s deployment are not initialized, which is mandatory for every pipeline, you also need to upgrade the gmc-router.yaml to the latest version which is mentioned in the GMC README

Deploy one client pod for testing the chatQnA application

kubectl create deployment client-test -n chatqa --image=python:3.8.13 -- sleep infinity

Access the pipeline using the above URL from the client pod

export CLIENT_POD=$(kubectl get pod -n chatqa  -l app=client-test -o jsonpath={.items..metadata.name})
export accessUrl=$(kubectl get gmc -n chatqa -o jsonpath="{.items[?(@.metadata.name=='chatqa')].status.accessUrl}")
kubectl exec "$CLIENT_POD" -n chatqa -- curl $accessUrl  -X POST  -d '{"text":"What is the revenue of Nike in 2023?","parameters":{"max_new_tokens":17, "do_sample": true}}' -H 'Content-Type: application/json'

Use GMC to adjust the chatQnA Pipeline

Modify chatQnA custom resource to change to another LLM model

- name: Tgi
  internalService:
    serviceName: tgi-svc
    config:
      LLM_MODEL_ID: Llama-2-7b-chat-hf

Check the tgi-svc-deployment has been changed to use the new LLM Model

kubectl get deployment tgi-svc-deployment -n chatqa -o jsonpath="{.spec.template.spec.containers[*].env[?(@.name=='LLM_MODEL_ID')].value}"

Access the updated pipeline using the above URL from the client pod

kubectl exec "$CLIENT_POD" -n chatqa -- curl $accessUrl  -X POST  -d '{"text":"What is the revenue of Nike in 2023?","parameters":{"max_new_tokens":17, "do_sample": true}}' -H 'Content-Type: application/json'

Remove one step of the pipeline If you want to adjust the steps of the pipeline, for example, if you want to delete the data preparation step from chatQnA, you can simply delete this part from the yaml file config/samples/chatQnA_dataprep_xeon.yaml

      - name: DataPrep
        internalService:
          serviceName: data-prep-svc
          config:
            endpoint: /v1/dataprep
            REDIS_URL: redis-vector-db
            TEI_ENDPOINT: tei-embedding-svc
          isDownstreamService: true

and re-apply the yaml file

kubectl apply -f $(pwd)/config/samples/chatQnA_dataprep_xeon.yaml

you would see the dataprep is deleted

$ kubectl get gmc -n chatqa chatqa
NAME     URL                                                   READY   AGE
chatqa   http://router-service.chatqa.svc.cluster.local:8080   9/0/9   3m37s

But please be noted, you have to make sure the step is eligible to be deleted without affecting the pipeline function.

Use GMC to delete the chatQnA Pipeline

you can delete all the resources by deleting the gmc custom resource

$ kubectl delete gmc -n chatqa chatqa
gmconnector.gmc.opea.io "chatqa" deleted

$ kubectl get gmc -n chatqa
No resources found in chatqa namespace.

$ kubectl get all -n chatqa
No resources found in chatqa namespace.

Use GMC and Istio to compose an OPEA Pipeline with authentication and authorization enabled

The critical steps of authentication and authorization are vital to maintaining the integrity and safety of our GenAI workload. Please check the readme file for more details.