Whisper-large-v3-turbo

Whisper-large-v3-turbo: ASR

Whisper-large-v3-turbo is one of the most powerful speech recognition models in OpenAI’s Whisper series, combining the high accuracy of large-scale models with optimized inference speed. Based on Whisper-large-v3, this version is designed to offer faster response times and reduced computational resource usage while maintaining excellent multilingual recognition and robustness. Whisper-large-v3-turbo supports complex tasks such as speech-to-text transcription, real-time captioning, and speech translation, making it suitable for deployment on high-performance servers and cloud platforms to deliver stable and efficient speech processing for advanced applications.

The source model can be found here

Performance Reference

Device

Language

Precision

Audio Duration

RTF

File Size

Supported Language

Supported Languages
Chinese
English
Japanese
Korean
French
Thai

Note: In the performance reference section on the right, the RTF values for each language are shown based on the current audio input length. Since the model uses fixed input dimensions (non-dynamic input), the RTF value may slightly increase when the audio length is shorter than the reference length.

Inference with AidASR SDK

To be released

License

Source Model:MIT

Deployable Model:APLUX-MODEL-FARM-LICENSE