Embedding Server

1. Introduction

This service has an OpenAI compatible restful API to extract text features. It is dedicated to be used on Xeon to accelerate embedding model serving. Currently the local model is BGE-large-zh-v1.5.

2. Quick Start

2.1 Build Docker image

docker build -t embedding:latest -f ./docker/Dockerfile .

2.2 Launch server

docker run -itd -p 8000:8000 embedding:latest

2.3 Client test

  • Restful API by curl

curl -X POST http://127.0.0.1:8000/v1/embeddings -H "Content-Type: application/json" -d '{ "model": "/home/user/bge-large-zh-v1.5/", "input": "hello world"}'
  • generate embedding from python

DEFAULT_MODEL = "/home/user/bge-large-zh-v1.5/"
SERVICE_URL = "http://127.0.0.1:8000"
INPUT_STR = "Hello world!"

client = Client(api_key="fake", base_url=SERVICE_URL)
emb = client.embeddings.create(
    model=DEFAULT_MODEL,
    input=INPUT_STR,
)