Fine-tuning Microservice¶
Fine-tuning microservice involves adapting a model to a specific task or dataset to improve its performance on that task, we currently supported instruction tuning for LLMs, finetuning for reranking and embedding models.
🚀1. Start Microservice with Python (Optional 1)¶
1.1 Install Requirements¶
python -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
python -m pip install intel-extension-for-pytorch
python -m pip install oneccl_bind_pt --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/
pip install -r requirements.txt
1.2 Start Finetuning Service with Python Script¶
1.2.1 Start Ray Cluster¶
OneCCL and Intel MPI libraries should be dynamically linked in every node before Ray starts:
source $(python -c "import oneccl_bindings_for_pytorch as torch_ccl; print(torch_ccl.cwd)")/env/setvars.sh
Start Ray locally using the following command.
ray start --head
For a multi-node cluster, start additional Ray worker nodes with below command.
ray start --address='${head_node_ip}:6379'
1.2.2 Start Finetuning Service¶
export HF_TOKEN=${your_huggingface_token}
python finetuning_service.py
🚀2. Start Microservice with Docker (Optional 2)¶
2.1 Setup on CPU¶
2.1.1 Build Docker Image¶
Build docker image with below command:
export HF_TOKEN=${your_huggingface_token}
cd ../../
docker build -t opea/finetuning:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy --build-arg HF_TOKEN=$HF_TOKEN -f comps/finetuning/Dockerfile .
2.1.2 Run Docker with CLI¶
Start docker container with below command:
docker run -d --name="finetuning-server" -p 8015:8015 --runtime=runc --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy opea/finetuning:latest
2.2 Setup on Gaudi2¶
2.2.1 Build Docker Image¶
Build docker image with below command:
cd ../../
docker build -t opea/finetuning-gaudi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/finetuning/Dockerfile.intel_hpu .
2.2.2 Run Docker with CLI¶
Start docker container with below command:
export HF_TOKEN=${your_huggingface_token}
docker run --runtime=habana -e HABANA_VISIBLE_DEVICES=all -p 8015:8015 -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host -e https_proxy=$https_proxy -e http_proxy=$http_proxy -e no_proxy=$no_proxy -e HF_TOKEN=$HF_TOKEN opea/finetuning-gaudi:latest
🚀3. Consume Finetuning Service¶
3.1 Upload a training file¶
Download a training file, such as alpaca_data.json
for instruction tuning and upload it to the server with below command, this file can be downloaded in here:
# upload a training file
curl http://${your_ip}:8015/v1/files -X POST -H "Content-Type: multipart/form-data" -F "file=@./alpaca_data.json" -F purpose="fine-tune"
For reranking and embedding models finetuning, the training file toy_finetune_data.jsonl is an toy example.
3.2 Create fine-tuning job¶
3.2.1 Instruction Tuning¶
After a training file like alpaca_data.json
is uploaded, use the following command to launch a finetuning job using meta-llama/Llama-2-7b-chat-hf
as base model:
# create a finetuning job
curl http://${your_ip}:8015/v1/fine_tuning/jobs \
-X POST \
-H "Content-Type: application/json" \
-d '{
"training_file": "alpaca_data.json",
"model": "meta-llama/Llama-2-7b-chat-hf"
}'
3.2.2 Reranking Model Training¶
Use the following command to launch a finetuning job for reranking model finetuning, such as BAAI/bge-reranker-large
:
# create a finetuning job
curl http://${your_ip}:8015/v1/fine_tuning/jobs \
-X POST \
-H "Content-Type: application/json" \
-d '{
"training_file": "toy_finetune_data.jsonl",
"model": "BAAI/bge-reranker-large",
"General":{
"task":"rerank",
"lora_config":null
}
}'
3.2.3 Embedding Model Training¶
Use the following command to launch a finetuning job for embedding model finetuning, such as BAAI/bge-base-en-v1.5
:
# create a finetuning job
curl http://${your_ip}:8015/v1/fine_tuning/jobs \
-X POST \
-H "Content-Type: application/json" \
-d '{
"training_file": "toy_finetune_data.jsonl",
"model": "BAAI/bge-base-en-v1.5",
"General":{
"task":"embedding",
"lora_config":null
}
}'
# If training on Gaudi2, we need to set --padding "max_length" and the value of --query_max_len is same with --passage_max_len for static shape during training. For example:
curl http://${your_ip}:8015/v1/fine_tuning/jobs \
-X POST \
-H "Content-Type: application/json" \
-d '{
"training_file": "toy_finetune_data.jsonl",
"model": "BAAI/bge-base-en-v1.5",
"General":{
"task":"embedding",
"lora_config":null
},
"Dataset":{
"query_max_len":128,
"passage_max_len":128,
"padding":"max_length"
}
}'
3.2.4 LLM Pretraining¶
Use the following command to launch a job for LLM pretraining, such as meta-llama/Llama-2-7b-hf
:
# create a finetuning job
curl http://${your_ip}:8015/v1/fine_tuning/jobs \
-X POST \
-H "Content-Type: application/json" \
-d '{
"training_file": "test_data.json",
"model": "meta-llama/Llama-2-7b-hf",
"General":{
"task":"pretraining",
"lora_config":null
}
}'
Below is an example for the format of the pretraining dataset:
{"text": "A girl with a blue tank top sitting watching three dogs."}
{"text": "A boy with a blue tank top sitting watching three dogs."}
3.3 Manage fine-tuning job¶
Below commands show how to list finetuning jobs, retrieve a finetuning job, cancel a finetuning job and list checkpoints of a finetuning job.
# list finetuning jobs
curl http://${your_ip}:8015/v1/fine_tuning/jobs -X GET
# retrieve one finetuning job
curl http://localhost:8015/v1/fine_tuning/jobs/retrieve -X POST -H "Content-Type: application/json" -d '{"fine_tuning_job_id": ${fine_tuning_job_id}}'
# cancel one finetuning job
curl http://localhost:8015/v1/fine_tuning/jobs/cancel -X POST -H "Content-Type: application/json" -d '{"fine_tuning_job_id": ${fine_tuning_job_id}}'
# list checkpoints of a finetuning job
curl http://${your_ip}:8015/v1/finetune/list_checkpoints -X POST -H "Content-Type: application/json" -d '{"fine_tuning_job_id": ${fine_tuning_job_id}}'
3.4 Leverage fine-tuned model¶
After fine-tuning job is done, fine-tuned model can be chosen from listed checkpoints, then the fine-tuned model can be used in other microservices. For example, fine-tuned reranking model can be used in reranks microservice by assign its path to the environment variable RERANK_MODEL_ID
, fine-tuned embedding model can be used in embeddings microservice by assign its path to the environment variable model
, LLMs after instruction tuning can be used in llms microservice by assign its path to the environment variable your_hf_llm_model
.
🚀4. Descriptions for Finetuning parameters¶
We utilize OpenAI finetuning parameters and extend it with more customizable parameters, see the definitions at finetune_config.