SearchQnA¶
Helm chart for deploying SearchQnA service.
SearchQnA depends on the following helm charts(micro services):
Installing the Chart¶
To install the chart, run the following:
cd GenAIInfra/helm-charts/
./update_dependency.sh
helm dependency update searchqna
export MODELDIR="/mnt/opea-models"
export MODEL="Intel/neural-chat-7b-v3-3"
export HFTOKEN="insert-your-huggingface-token-here"
export GOOGLE_API_KEY="insert-your-google-api-key-here"
export GOOGLE_CSE_ID="insert-your-google-search-engine-id-here"
# To run on Xeon
helm install searchqna searchqna --set global.modelUseHostPath=${MODELDIR} --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --set web-retriever.GOOGLE_API_KEY=${GOOGLE_API_KEY} --web-retriever.GOOGLE_CSE_ID=${GOOGLE_CSE_ID} --set tgi.LLM_MODEL_ID=${MODEL} --set llm-uservice.LLM_MODEL_ID=${MODEL}
# To run on Gaudi
# helm install searchqna searchqna --set global.modelUseHostPath=${MODELDIR} --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --set web-retriever.GOOGLE_API_KEY=${GOOGLE_API_KEY} --web-retriever.GOOGLE_CSE_ID=${GOOGLE_CSE_ID} --set tgi.LLM_MODEL_ID=${MODEL} --set llm-uservice.LLM_MODEL_ID=${MODEL} -f gaudi-values.yaml
IMPORTANT NOTE¶
Make sure your
MODELDIR
exists on the node where your workload is scheduled so you can cache the downloaded model for next time use. Otherwise, setglobal.modelUseHostPath
to ‘null’ if you don’t want to cache the model. This workload by default will download modelIntel/neural-chat-7b-v3-3
,BAAI/bge-base-en-v1.5
,BAAI/bge-reranker-base
for inferencing, embedding, reranking respectively.
Verify¶
To verify the installation, run the command kubectl get pod
to make sure all pods are running.
Verify the workload through curl command¶
Then run the command kubectl port-forward svc/searchqna 3008:3008
to expose the service for access.
Open another terminal and run the following command to verify the service if working:
curl http://localhost:3008/v1/searchqna \
-X POST \
-d '{"messages": "What is the latest news? Give me also the source link.", "stream": "True"}' \
-H 'Content-Type: application/json'
Verify the workload through UI¶
The UI has already been installed via the Helm chart. To access it, use the external IP of one your Kubernetes node along with the NGINX port. You can find the NGINX port using the following command:
export port=$(kubectl get service searchqna-nginx --output='jsonpath={.spec.ports[0].nodePort}')
echo $port
Open a browser to access http://<k8s-node-ip-address>:${port}
to play with.
Values¶
Key |
Type |
Default |
Description |
---|---|---|---|
image.repository |
string |
|
|
service.port |
string |
|
|
tgi.LLM_MODEL_ID |
string |
|
inference model |
llm_uservice.LLM_MODEL_ID |
string |
|
should be the same as |
global.monitoring |
bool |
|
Enable usage metrics for the service components. See ../monitoring.md before enabling! |