Whisper-medium

Whisper-medium: ASR

Whisper-medium is one of the larger models in OpenAI’s Whisper series, offering enhanced speech modeling capabilities and higher recognition accuracy. Compared to the small variant, Whisper-medium has significantly more parameters and a deeper architecture, allowing it to handle multilingual input, long-form audio, accented speech, and noisy environments with greater reliability. The model performs well across various public speech recognition benchmarks and is ideal for demanding applications such as transcription, subtitle generation, and speech translation. It is typically deployed on high-performance servers or cloud-based systems.

The source model can be found here

Performance Reference

Device

Language

Precision

Audio Duration

RTF

File Size

Supported Language

Supported Languages
Chinese
English
Japanese
Korean
French
Thai

Note: In the performance reference section on the right, the RTF values for each language are shown based on the current audio input length. Since the model uses fixed input dimensions (non-dynamic input), the RTF value may slightly increase when the audio length is shorter than the reference length.

Inference with AidASR SDK

To be released

License

Source Model:MIT

Deployable Model:APLUX-MODEL-FARM-LICENSE