Whisper-distill-large-v3
ASR
W8A16
post
Whisper-distill-large-v3: ASR

Whisper-distill-large-v3 is a distilled version of OpenAI’s Whisper-large-v3, optimized through knowledge distillation techniques. The model significantly reduces the number of parameters and computational demands while maintaining high recognition accuracy. This results in faster inference speed and lower latency. Whisper-distill-large-v3 is ideal for resource-constrained environments that still require strong recognition performance, such as mobile devices, edge computing, and real-time transcription services. It supports multilingual speech recognition and various speech processing tasks, balancing efficiency and effectiveness.

The source model can be found here

Performance Reference

Device

Language
Precision
Audio Duration
RTF
File Size
Supported Language
Supported Languages
English

Note: In the performance reference section on the right, the RTF values for each language are shown based on the current audio input length. Since the model uses fixed input dimensions (non-dynamic input), the RTF value may slightly increase when the audio length is shorter than the reference length.

Inference with AidASR SDK

To be released

License
Source Model:MIT
Deployable Model:APLUX-MODEL-FARM-LICENSE
Performance Reference

Device

Language
Precision
Audio Duration
RTF
File Size