Whisper-medium
ASR
W8A16
post
Whisper-medium: ASR

Whisper-medium is one of the larger models in OpenAI’s Whisper series, offering enhanced speech modeling capabilities and higher recognition accuracy. Compared to the small variant, Whisper-medium has significantly more parameters and a deeper architecture, allowing it to handle multilingual input, long-form audio, accented speech, and noisy environments with greater reliability. The model performs well across various public speech recognition benchmarks and is ideal for demanding applications such as transcription, subtitle generation, and speech translation. It is typically deployed on high-performance servers or cloud-based systems.

The source model can be found here

Performance Reference

Device

Language
Precision
Audio Duration
RTF
File Size
Supported Language
Supported Languages
Chinese
English
Japanese
Korean
French
Thai

Note: In the performance reference section on the right, the RTF values for each language are shown based on the current audio input length. Since the model uses fixed input dimensions (non-dynamic input), the RTF value may slightly increase when the audio length is shorter than the reference length.

Inference with AidASR SDK

To be released

License
Source Model:MIT
Deployable Model:APLUX-MODEL-FARM-LICENSE
Performance Reference

Device

Language
Precision
Audio Duration
RTF
File Size