Whisper-distill-large-v3

Whisper-distill-large-v3: ASR

Whisper-distill-large-v3 is a distilled version of OpenAI’s Whisper-large-v3, optimized through knowledge distillation techniques. The model significantly reduces the number of parameters and computational demands while maintaining high recognition accuracy. This results in faster inference speed and lower latency. Whisper-distill-large-v3 is ideal for resource-constrained environments that still require strong recognition performance, such as mobile devices, edge computing, and real-time transcription services. It supports multilingual speech recognition and various speech processing tasks, balancing efficiency and effectiveness.

The source model can be found here

Performance Reference

Device

Language

Precision

Audio Duration

RTF

File Size

Supported Language

Supported Languages
English

Note: In the performance reference section on the right, the RTF values for each language are shown based on the current audio input length. Since the model uses fixed input dimensions (non-dynamic input), the RTF value may slightly increase when the audio length is shorter than the reference length.

Inference with AidASR SDK

To be released

License

Source Model:MIT

Deployable Model:APLUX-MODEL-FARM-LICENSE