# 🌟 Embedding Microservice with OpenVINO Model Server This guide walks you through starting, deploying, and consuming the **OVMS Embeddings Microservice**. 🚀 It is Intel highly optimized serving solution which employs OpenVINO Runtime for super fast inference on CPU. --- ## 📦 1. Start Microservice with `docker run` ### 🔹 1.1 Start Embedding Service with OVMS 1. Prepare the model in the model repository This step will export the model from HuggingFace Hub to the local models repository. At the some time model will be converted to IR format and optionally quantized. It speedup starting the service and avoids copying the model from Internet each time the container starts. ``` pip3 install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2025/0/demos/common/export_models/requirements.txt curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2025/0/demos/common/export_models/export_model.py -o export_model.py mkdir models python export_model.py embeddings --source_model BAAI/bge-large-en-v1.5 --weight-format int8 --config_file_path models/config_embeddings.json --model_repository_path models --target_device CPU ``` 2. **Test the OVMS service**: Run the following command to check if the service is up and running. ```bash your_port=8090 docker run -p $your_port:8000 -v ./models:/models --name ovms-embedding-serving \ openvino/model_server:2025.0 --port 8000 --config_path /models/config_embeddings.json ``` 3. **Test the OVMS service**: Run the following command to check if the service is up and running. ```bash curl http://localhost:$your_port/v3/embeddings \ -X POST \ -H 'Content-Type: application/json' -d '{ "model": "BAAI/bge-large-en-v1.5", "input":"What is Deep Learning?" }' ``` ### 🔹 1.2 Build Docker Image and Run Docker with CLI 1. Build the Docker image for the embedding microservice: ```bash cd ../../../ docker build -t opea/embedding:latest \ --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy \ -f comps/embeddings/src/Dockerfile . ``` 2. Run the embedding microservice and connect it to the OVMS service: ```bash docker run -d --name="embedding-ovms-server" \ -p 6000:6000 \ --ipc=host \ -e OVMS_EMBEDDING_ENDPOINT=$OVMS_EMBEDDING_ENDPOINT \ -e MODEL_ID=$MODEL_ID \ -e EMBEDDING_COMPONENT_NAME="OPEA_OVMS_EMBEDDING" \ opea/embedding:latest ``` ## 📦 2. Start Microservice with docker compose Deploy both the OVMS Embedding Service and the Embedding Microservice using Docker Compose. 🔹 Steps: 1. Set environment variables: ```bash export host_ip=${your_ip_address} export MODEL_ID="BAAI/bge-large-en-v1.5" export OVMS_EMBEDDER_PORT=8090 export EMBEDDER_PORT=6000 export OVMS_EMBEDDING_ENDPOINT="http://${host_ip}:${OVMS_EMBEDDER_PORT}" ``` 2. Navigate to the Docker Compose directory: ```bash cd comps/embeddings/deployment/docker_compose/ ``` 3. Start the services: ```bash docker compose up ovms-embedding-server -d ``` ## 📦 3. Consume Embedding Service ### 🔹 3.1 Check Service Status Verify the embedding service is running: ```bash curl http://localhost:6000/v1/health_check \ -X GET \ -H 'Content-Type: application/json' ``` ### 🔹 3.2 Use the Embedding Service API The API is compatible with the [OpenAI API](https://platform.openai.com/docs/api-reference/embeddings). 1. Single Text Input ```bash curl http://localhost:6000/v1/embeddings \ -X POST \ -d '{"input":"Hello, world!"}' \ -H 'Content-Type: application/json' ``` 2. Multiple Text Inputs with Parameters ```bash curl http://localhost:6000/v1/embeddings \ -X POST \ -d '{"input":["Hello, world!","How are you?"], "dimensions":100}' \ -H 'Content-Type: application/json' ``` ## ✨ Tips for Better Understanding: 1. Port Mapping: Ensure the ports are correctly mapped to avoid conflicts with other services. 2. Model Selection: Choose a model appropriate for your use case, like "BAAI/bge-large-en-v1.5" or "BAAI/bge-base-en-v1.5". It should be exported to the models repository and set in 'MODEL_ID' env in the deployment of the OPEA API wrapper. 3. Models repository Volume: The `-v ./models:/models` flag ensures the models directory is correctly mounted. 4. Select correct configuration JSON file Models repository can host multiple models. Choose the models to be served by selecting the right configuration file. In the example above `config_embeddings.json` 5. Upload the models to persistent volume claim in Kubernetes Models repository with configuration JSON file will be mounted in the OVMS containers when deployed via [helm chart](../../third_parties/ovms/deployment/kubernetes/README.md). 6. Learn more about [OVMS embeddings API](https://docs.openvino.ai/2025/model-server/ovms_docs_rest_api_embeddings.html)