ASR Microservice¶
ASR (Audio-Speech-Recognition) microservice helps users convert speech to text. When building a talking bot with LLM, users will need to convert their audio inputs (What they talk, or Input audio from other sources) to text, so the LLM is able to tokenize the text and generate an answer. This microservice is built for that conversion stage.
Table of contents¶
Architecture¶
ASR Server: This microservice is responsible for converting speech audio into text. It receives an audio file as input and returns the transcribed text, enabling downstream applications such as conversational bots to process spoken language. The ASR server supports deployment on both CPU and HPU platforms.
Whisper Server: This microservice is responsible for converting speech audio into text using the Whisper model. It exposes an API endpoint that accepts audio files and returns the transcribed text, supporting both CPU and HPU deployments. The Whisper server acts as the backend for ASR functionality in the overall architecture.
Deployment Options¶
For detailed, step-by-step instructions on how to deploy the ASR microservice using Docker Compose on different Intel platforms, please refer to the deployment guide. The guide contains all necessary steps, including building images, configuring the environment, and running the service.
Platform |
Deployment Method |
Link |
---|---|---|
Intel Xeon/Gaudi2 |
Docker Compose |
Validated Configurations¶
The following configurations have been validated for the ASR microservice.
Deploy Method |
Core Models |
Platform |
---|---|---|
Docker Compose |
Whisper |
Intel Xeon/Gaudi2 |