LM-Eval Microservice

This microservice, designed for lm-eval, which can host a separate llm server to evaluate lm-eval tasks.

CPU service

build cpu docker

docker build -f Dockerfile.cpu -t opea/lm-eval:latest .

start the server

  • set the environments MODEL, MODEL_ARGS, DEVICE and start the server

docker run -p 9006:9006 --ipc=host  -e MODEL="hf" -e MODEL_ARGS="pretrained=Intel/neural-chat-7b-v3-3" -e DEVICE="cpu" opea/lm-eval:latest

evaluate the model

  • set base_url and tokenizer

git clone https://github.com/opea-project/GenAIEval
cd GenAIEval
pip install -e .

cd GenAIEval/evaluation/lm_evaluation_harness/examples

python main.py \
    --model genai-hf \
    --model_args "base_url=http://{your_ip}:9006,tokenizer=Intel/neural-chat-7b-v3-3" \
    --tasks  "lambada_openai" \
    --batch_size 2