Whisper-large-v3-turbo
ASR
W8A16
post
Whisper-large-v3-turbo: ASR

Whisper-large-v3-turbo is one of the most powerful speech recognition models in OpenAI’s Whisper series, combining the high accuracy of large-scale models with optimized inference speed. Based on Whisper-large-v3, this version is designed to offer faster response times and reduced computational resource usage while maintaining excellent multilingual recognition and robustness. Whisper-large-v3-turbo supports complex tasks such as speech-to-text transcription, real-time captioning, and speech translation, making it suitable for deployment on high-performance servers and cloud platforms to deliver stable and efficient speech processing for advanced applications.

The source model can be found here

Performance Reference

Device

Language
Precision
Audio Duration
RTF
File Size
Supported Language
Supported Languages
Chinese
English
Japanese
Korean
French
Thai

Note: In the performance reference section on the right, the RTF values for each language are shown based on the current audio input length. Since the model uses fixed input dimensions (non-dynamic input), the RTF value may slightly increase when the audio length is shorter than the reference length.

Inference with AidASR SDK

To be released

License
Source Model:MIT
Deployable Model:APLUX-MODEL-FARM-LICENSE
Performance Reference

Device

Language
Precision
Audio Duration
RTF
File Size