Embedding Server¶

1. Introduction¶

This service has an OpenAI compatible restful API to extract text features. It is dedicated to be used on Xeon to accelerate embedding model serving. Currently the local model is BGE-large-zh-v1.5.

2. Quick Start¶

2.1 Build Docker image¶

docker build -t embedding:latest -f ./docker/Dockerfile .

2.2 Launch server¶

docker run -itd -p 8000:8000 embedding:latest

2.3 Client test¶

Restful API by curl

curl -X POST http://127.0.0.1:8000/v1/embeddings -H "Content-Type: application/json" -d '{ "model": "/home/user/bge-large-zh-v1.5/", "input": "hello world"}'

generate embedding from python

DEFAULT_MODEL = "/home/user/bge-large-zh-v1.5/"
SERVICE_URL = "http://127.0.0.1:8000"
INPUT_STR = "Hello world!"

client = Client(api_key="fake", base_url=SERVICE_URL)
emb = client.embeddings.create(
    model=DEFAULT_MODEL,
    input=INPUT_STR,
)