
Whisper-medium is one of the larger models in OpenAI’s Whisper series, offering enhanced speech modeling capabilities and higher recognition accuracy. Compared to the small variant, Whisper-medium has significantly more parameters and a deeper architecture, allowing it to handle multilingual input, long-form audio, accented speech, and noisy environments with greater reliability. The model performs well across various public speech recognition benchmarks and is ideal for demanding applications such as transcription, subtitle generation, and speech translation. It is typically deployed on high-performance servers or cloud-based systems.
The source model can be found here
Supported Languages |
---|
Chinese |
English |
Japanese |
Korean |
French |
Thai |
Note: In the performance reference section on the right, the RTF values for each language are shown based on the current audio input length. Since the model uses fixed input dimensions (non-dynamic input), the RTF value may slightly increase when the audio length is shorter than the reference length.
To be released