GPT-SoVITS Microservice¶
GPT-SoVITS allows you to to do zero-shot voice cloning and text to speech of multi languages such as English, Japanese, Korean, Cantonese and Chinese.
This microservice is validated on Xeon/CUDA. HPU support is under development.
Build the Image¶
docker build -t opea/gpt-sovits:latest --build-arg http_proxy=$http_proxy --build-arg https_proxy=$https_proxy -f comps/tts/src/integrations/dependency/gpt-sovits/Dockerfile .
Start the Service¶
docker run -itd -p 9880:9880 -e http_proxy=$http_proxy -e https_proxy=$https_proxy opea/gpt-sovits:latest
Test¶
Chinese only
curl localhost:9880/ -XPOST -d '{
"text": "先帝创业未半而中道崩殂,今天下三分,益州疲弊,此诚危急存亡之秋也。",
"text_language": "zh"
}' --output out.wav
English only
curl localhost:9880/ -XPOST -d '{
"text": "Discuss the evolution of text-to-speech (TTS) technology from its early beginnings to the present day. Highlight the advancements in natural language processing that have contributed to more realistic and human-like speech synthesis. Also, explore the various applications of TTS in education, accessibility, and customer service, and predict future trends in this field. Write a comprehensive overview of text-to-speech (TTS) technology.",
"text_language": "en"
}' --output out.wav
Auto detection of languages
curl localhost:9880/ -XPOST -d '{
"text": "Hi 你好,这里是一个 cross-lingual 的例子。",
"text_language": "auto"
}' --output out.wav
Change reference audio
This microservice allows you to use the zero-shot voice cloning feature. For example, you can change the reference audio from the default female to a male voice:
wget https://github.com/OpenTalker/SadTalker/blob/main/examples/driven_audio/chinese_poem1.wav
docker cp chinese_poem1.wav gpt-sovits-service:/home/user/chinese_poem1.wav
curl localhost:9880/change_refer -d '{
"refer_wav_path": "/home/user/chinese_poem1.wav",
"prompt_text": "窗前明月光,疑是地上霜,举头望明月,低头思故乡。",
"prompt_language": "zh"
}'
openai protocol compatible request
curl localhost:9880/v1/audio/speech -XPOST -d '{"input":"你好呀,你是谁. Hello, who are you?"}' -H 'Content-Type: application/json' --output speech.mp3