Fine-tuning Microservice¶

Fine-tuning microservice involves adapting a model to a specific task or dataset to improve its performance on that task, we currently supported instruction tuning for LLMs, finetuning for reranking and embedding models.

Table of contents¶

🚀1. Start Microservice with Python (Option 1)
🚀2. Start Microservice with Docker (Option 2)
🚀3. Consume Finetuning Service
🚀4. Descriptions for Finetuning parameters

🚀1. Start Microservice with Python (Option 1)¶

1.1 Install Requirements¶

python -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
python -m pip install intel-extension-for-pytorch
python -m pip install oneccl_bind_pt --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/
pip install -r requirements.txt

1.2 Start Finetuning Service with Python Script¶

1.2.1 Start Ray Cluster¶

OneCCL and Intel MPI libraries should be dynamically linked in every node before Ray starts:

source $(python -c "import oneccl_bindings_for_pytorch as torch_ccl; print(torch_ccl.cwd)")/env/setvars.sh

Start Ray locally using the following command.

ray start --head

For a multi-node cluster, start additional Ray worker nodes with below command.

ray start --address='${head_node_ip}:6379'

1.2.2 Start Finetuning Service¶

export HF_TOKEN=${your_huggingface_token}
# export FINETUNING_COMPONENT_NAME="which component you want to run"
# export FINETUNING_COMPONENT_NAME="OPEA_FINETUNING" or export FINETUNING_COMPONENT_NAME="XTUNE_FINETUNING"
python opea_finetuning_microservice.py

🚀2. Start Microservice with Docker (Option 2)¶

2.1 Setup on CPU¶

2.1.1 Build Docker Image¶

Build docker image with below command:

export HF_TOKEN=${your_huggingface_token}
cd ../../
docker build -t opea/finetuning:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy --build-arg HF_TOKEN=$HF_TOKEN -f comps/finetuning/src/Dockerfile .

2.1.2 Run Docker with CLI¶

Start docker container with below command:

docker run -d --name="finetuning-server" -p 8015:8015 --runtime=runc --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy opea/finetuning:latest

Or use docker compose with below command:

cd ../deployment/docker_compose
docker compose -f compose.yaml up finetuning -d

2.2 Setup on Gaudi2¶

2.2.1 Build Docker Image¶

Build docker image with below command:

cd ../../
docker build -t opea/finetuning-gaudi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/finetuning/src/Dockerfile.intel_hpu .

2.2.2 Run Docker with CLI¶

Start docker container with below command:

export HF_TOKEN=${your_huggingface_token}
docker run --runtime=habana -e HABANA_VISIBLE_DEVICES=all -p 8015:8015 -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host -e https_proxy=$https_proxy -e http_proxy=$http_proxy -e no_proxy=$no_proxy -e HF_TOKEN=$HF_TOKEN opea/finetuning-gaudi:latest

Or use docker compose with below command:

export HF_TOKEN=${your_huggingface_token}
cd ../deployment/docker_compose
docker compose -f compose.yaml up finetuning-gaudi -d

2.3 Setup Xtune on Arc A770¶

Please follow doc to install Xtune on Arc A770

🚀3. Consume Finetuning Service¶

3.1 Upload a training file¶

Download a training file, such as alpaca_data.json for instruction tuning and upload it to the server with below command, this file can be downloaded in here:

# upload a training file
curl http://${your_ip}:8015/v1/files -X POST -H "Content-Type: multipart/form-data" -F "file=@./alpaca_data.json" -F purpose="fine-tune"

For reranking and embedding models finetuning, the training file toy_finetune_data.jsonl is an toy example.

3.2 Create fine-tuning job¶

3.2.1 Instruction Tuning¶

After a training file like alpaca_data.json is uploaded, use the following command to launch a finetuning job using meta-llama/Llama-2-7b-chat-hf as base model:

# create a finetuning job
curl http://${your_ip}:8015/v1/fine_tuning/jobs \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "training_file": "alpaca_data.json",
    "model": "meta-llama/Llama-2-7b-chat-hf"
  }'

3.2.2 Reranking Model Training¶

Use the following command to launch a finetuning job for reranking model finetuning, such as BAAI/bge-reranker-large:

# create a finetuning job
curl http://${your_ip}:8015/v1/fine_tuning/jobs \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "training_file": "toy_finetune_data.jsonl",
    "model": "BAAI/bge-reranker-large",
    "General":{
      "task":"rerank",
      "lora_config":null
    }
  }'

3.2.3 Embedding Model Training¶

Use the following command to launch a finetuning job for embedding model finetuning, such as BAAI/bge-base-en-v1.5:

# create a finetuning job
curl http://${your_ip}:8015/v1/fine_tuning/jobs \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "training_file": "toy_finetune_data.jsonl",
    "model": "BAAI/bge-base-en-v1.5",
    "General":{
      "task":"embedding",
      "lora_config":null
    }
  }'


# If training on Gaudi2, we need to set --padding "max_length" and the value of --query_max_len is same with --passage_max_len for static shape during training. For example:
curl http://${your_ip}:8015/v1/fine_tuning/jobs \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "training_file": "toy_finetune_data.jsonl",
    "model": "BAAI/bge-base-en-v1.5",
    "General":{
      "task":"embedding",
      "lora_config":null
    },
    "Dataset":{
      "query_max_len":128,
      "passage_max_len":128,
      "padding":"max_length"
    }
  }'

3.2.4 LLM Pretraining¶

Use the following command to launch a job for LLM pretraining, such as meta-llama/Llama-2-7b-hf:

# create a finetuning job
curl http://${your_ip}:8015/v1/fine_tuning/jobs \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "training_file": "test_data.json",
    "model": "meta-llama/Llama-2-7b-hf",
    "General":{
      "task":"pretraining",
      "lora_config":null
    }
  }'

Below is an example for the format of the pretraining dataset:

{"text": "A girl with a blue tank top sitting watching three dogs."}
{"text": "A boy with a blue tank top sitting watching three dogs."}

3.2.5 Direct Preference Optimization (DPO)¶

Use the following command to launch a job for LLM Direct Preference Optimization, such as meta-llama/Llama-2-7b-hf:

# create a finetuning job
curl http://${your_ip}:8015/v1/fine_tuning/jobs \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "training_file": "test_data.jsonl",
    "model": "meta-llama/Llama-2-7b-hf",
    "General":{
      "task":"dpo"
    }
  }'

Below is an example for the format of the DPO dataset:

{"system": "", "question": "You will be given a definition of a task first, then some input of the task.\nThis task is about using the specified sentence and converting the sentence to Resource Description Framework (RDF) triplets of the form (subject, predicate object). The RDF triplets generated must be such that the triplets accurately capture the structure and semantics of the input sentence. The input is a sentence and the output is a list of triplets of the form [subject, predicate, object] that capture the relationships present in the sentence. When a sentence has more than 1 RDF triplet possible, the output must contain all of them.\n\nAFC Ajax (amateurs)'s ground is Sportpark De Toekomst where Ajax Youth Academy also play.\nOutput:", "chosen": "[\n  [\"AFC Ajax (amateurs)\", \"has ground\", \"Sportpark De Toekomst\"],\n  [\"Ajax Youth Academy\", \"plays at\", \"Sportpark De Toekomst\"]\n]", "rejected": " Sure, I'd be happy to help! Here are the RDF triplets for the input sentence:\n\n[AFC Ajax (amateurs), hasGround, Sportpark De Toekomst]\n[Ajax Youth Academy, playsAt, Sportpark De Toekomst]\n\nExplanation:\n\n* AFC Ajax (amateurs) is the subject of the first triplet, and hasGround is the predicate that describes the relationship between AFC Ajax (amateurs) and Sportpark De Toekomst.\n* Ajax Youth Academy is the subject of the second triplet, and playsAt is the predicate that describes the relationship between Ajax Youth Academy and Sportpark De Toekomst.\n\nNote that there may be other possible RDF triplets that could be derived from the input sentence, but the above triplets capture the main relationships present in the sentence."}
{"system": "You are an AI assistant. You will be given a task. You must generate a detailed and long answer.", "question": "Generate an approximately fifteen-word sentence that describes all this data: Midsummer House eatType restaurant; Midsummer House food Chinese; Midsummer House priceRange moderate; Midsummer House customer rating 3 out of 5; Midsummer House near All Bar One", "chosen": "Midsummer House is a moderately priced Chinese restaurant with a 3/5 customer rating, located near All Bar One.", "rejected": " Sure! Here's a sentence that describes all the data you provided:\n\n\"Midsummer House is a moderately priced Chinese restaurant with a customer rating of 3 out of 5, located near All Bar One, offering a variety of delicious dishes.\""}

3.2.6 Reasoning model finetuning¶

Use the following command to launch a job for finetuning reasoning model, such as Qwen/Qwen2.5-7B:

# create a finetuning job
curl http://${your_ip}:8015/v1/fine_tuning/jobs \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "training_file": "medical_o1_sft.json",
    "model": "Qwen/Qwen2.5-3B",
    "Dataset": {
      "max_length":2048
    },
    "Training":{
      "epochs":1,
      "batch_size":1,
      "gradient_accumulation_steps":8
    },
    "General": {
      "task":"reasoning"
    }
  }'

Below is an example for the format of the reasoning dataset FreedomIntelligence/medical-o1-reasoning-SFT:

{
  "Question": "Given the symptoms of sudden weakness in the left arm and leg, recent long-distance travel, and the presence of swollen and tender right lower leg, what specific cardiac abnormality is most likely to be found upon further evaluation that could explain these findings?",
  "Complex_CoT": "Okay, let's see what's going on here. We've got sudden weakness in the person's left arm and leg - and that screams something neuro-related, maybe a stroke?\n\nBut wait, there's more. The right lower leg is swollen and tender, which is like waving a big flag for deep vein thrombosis, especially after a long flight or sitting around a lot.\n\nSo, now I'm thinking, how could a clot in the leg end up causing issues like weakness or stroke symptoms?\n\nOh, right! There's this thing called a paradoxical embolism. It can happen if there's some kind of short circuit in the heart - like a hole that shouldn't be there.\n\nLet's put this together: if a blood clot from the leg somehow travels to the left side of the heart, it could shoot off to the brain and cause that sudden weakness by blocking blood flow there.\n\nHmm, but how would the clot get from the right side of the heart to the left without going through the lungs and getting filtered out?\n\nHere's where our cardiac anomaly comes in: a patent foramen ovale or PFO. That's like a sneaky little shortcut in the heart between the right and left atria.\n\nAnd it's actually pretty common, found in about a quarter of adults, which definitely makes it the top suspect here.\n\nSo with all these pieces - long travel, leg clot, sudden weakness - a PFO fits the bill perfectly, letting a clot cross over and cause all this.\n\nEverything fits together pretty neatly, so I'd bet PFO is the heart issue waiting to be discovered. Yeah, that really clicks into place!",
  "Response": "The specific cardiac abnormality most likely to be found in this scenario is a patent foramen ovale (PFO). This condition could allow a blood clot from the venous system, such as one from a deep vein thrombosis in the leg, to bypass the lungs and pass directly into the arterial circulation. This can occur when the clot moves from the right atrium to the left atrium through the PFO. Once in the arterial system, the clot can travel to the brain, potentially causing an embolic stroke, which would explain the sudden weakness in the left arm and leg. The connection between the recent travel, which increases the risk of deep vein thrombosis, and the neurological symptoms suggests the presence of a PFO facilitating a paradoxical embolism."
}

3.3 Manage fine-tuning job¶

Below commands show how to list finetuning jobs, retrieve a finetuning job, cancel a finetuning job and list checkpoints of a finetuning job.

# list finetuning jobs
curl http://${your_ip}:8015/v1/fine_tuning/jobs -X GET

# retrieve one finetuning job
curl http://localhost:8015/v1/fine_tuning/jobs/retrieve -X POST -H "Content-Type: application/json" -d '{"fine_tuning_job_id": ${fine_tuning_job_id}}'

# cancel one finetuning job
curl http://localhost:8015/v1/fine_tuning/jobs/cancel -X POST -H "Content-Type: application/json" -d '{"fine_tuning_job_id": ${fine_tuning_job_id}}'

# list checkpoints of a finetuning job
curl http://${your_ip}:8015/v1/finetune/list_checkpoints -X POST -H "Content-Type: application/json" -d '{"fine_tuning_job_id": ${fine_tuning_job_id}}'

3.4 Leverage fine-tuned model¶

After fine-tuning job is done, fine-tuned model can be chosen from listed checkpoints, then the fine-tuned model can be used in other microservices. For example, fine-tuned reranking model can be used in reranks microservice by assign its path to the environment variable RERANK_MODEL_ID, fine-tuned embedding model can be used in embeddings microservice by assign its path to the environment variable model, LLMs after instruction tuning can be used in llms microservice by assign its path to the environment variable your_hf_llm_model.

3.5 Xtune¶

Once you follow 3.2 Setup Xtune on Arc A770, you can access Xtune in web through http://localhost:7860/ Please see Xtune doc for details.

🚀4. Descriptions for Finetuning parameters¶

We utilize OpenAI finetuning parameters and extend it with more customizable parameters, see the definitions at finetune_config.