Embedding Microservice with OpenVINO Model Server (OVMS)¶

The OVMS Embedding Microservice is Intel’s highly optimized serving solution for generating embeddings using the OpenVINO Runtime. It efficiently converts text into high-dimensional vector embeddings with super fast inference on CPU.

Table of Contents¶

Start Microservice with docker run
Start Microservice with Docker Compose
Consume Embedding Service
Tips for Better Understanding

Start Microservice with `docker run`¶

Prepare Model and Export¶

Install requirements and export the model from HuggingFace Hub to local repository, convert to IR format and optionally quantize for faster startup:

pip3 install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2025/0/demos/common/export_models/requirements.txt
curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2025/0/demos/common/export_models/export_model.py -o export_model.py
mkdir models
python export_model.py embeddings --source_model BAAI/bge-large-en-v1.5 --weight-format int8 --config_file_path models/config_embeddings.json --model_repository_path models --target_device CPU

Run OVMS Docker Container¶

Run OVMS service container with model volume mounted and port mapping:

your_port=8090
docker run -p $your_port:8000 -v ./models:/models --name ovms-embedding-serving \
openvino/model_server:2025.0 --port 8000 --config_path /models/config_embeddings.json

Test OVMS Service¶

Run the following command to check if the service is up and running.

curl http://localhost:$your_port/v3/embeddings \
-X POST \
-H 'Content-Type: application/json'
-d '{
"model": "BAAI/bge-large-en-v1.5",
"input":"What is Deep Learning?"
}'

Build and Run Embedding Microservice Docker Image¶

Build the Docker image for the embedding microservice:

cd ../../../
docker build -t opea/embedding:latest \
--build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy \
-f comps/embeddings/src/Dockerfile .

Run the embedding microservice connected to OVMS service:

docker run -d --name="embedding-ovms-server" \
-p 6000:6000 \
--ipc=host \
-e OVMS_EMBEDDING_ENDPOINT=$OVMS_EMBEDDING_ENDPOINT \
-e MODEL_ID=$MODEL_ID \
-e EMBEDDING_COMPONENT_NAME="OPEA_OVMS_EMBEDDING" \
opea/embedding:latest

Start Microservice with Docker Compose¶

Deploy both the OVMS Embedding Service and the Embedding Microservice using Docker Compose.

Export environment variables:

export host_ip=${your_ip_address}
export MODEL_ID="BAAI/bge-large-en-v1.5"
export OVMS_EMBEDDER_PORT=8090
export EMBEDDER_PORT=6000
export OVMS_EMBEDDING_ENDPOINT="http://${host_ip}:${OVMS_EMBEDDER_PORT}"

Navigate to the Docker Compose directory:

cd comps/embeddings/deployment/docker_compose/

Start the services:

docker compose up ovms-embedding-server -d

Consume Embedding Service¶

Check Service Status¶

Verify the embedding service is running:

curl http://localhost:6000/v1/health_check \
-X GET \
-H 'Content-Type: application/json'

Use the Embedding Service API¶

The API is compatible with the OpenAI API.

Single Text Input

curl http://localhost:6000/v1/embeddings \
-X POST \
-d '{"input":"Hello, world!"}' \
-H 'Content-Type: application/json'

Multiple Text Inputs with Parameters

curl http://localhost:6000/v1/embeddings \
-X POST \
-d '{"input":["Hello, world!","How are you?"], "dimensions":100}' \
-H 'Content-Type: application/json'

Tips for Better Understanding¶

Port Mapping
Ensure the ports are correctly mapped to avoid conflicts with other services.
Model Selection
Choose a model appropriate for your use case, like "BAAI/bge-large-en-v1.5" or "BAAI/bge-base-en-v1.5".
The model should be exported into the model repository and set in the MODEL_ID environment variable when deploying the embedding wrapper service.
Models repository Volume
The -v ./models:/models flag ensures the model directory is correctly mounted into the container.
Configuration JSON Selection
The model repository can host multiple models. Select which models to serve by providing the correct configuration JSON file, such as config_embeddings.json.
Kubernetes Deployment
When deploying with Kubernetes, upload the model repository and configuration file to a persistent volume claim (PVC).
These will be mounted into the OVMS containers via Helm chart.
Learn More about OVMS Embeddings API
Refer to the OVMS Embeddings API Documentation for detailed API behavior.