Qwen2.5-0.5B-Instruct

Text Generation

W4A16

Qwen2.5 is the latest series of Qwen large language models. Qwen2.5 releases a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2:

Significantly more knowledge and has greatly improved capabilities in coding and mathematics, thanks to our specialized expert models in these domains.
Significant improvements in instruction following, generating long texts (over 8K tokens), understanding structured data (e.g, tables), and generating structured outputs especially JSON. More resilient to the diversity of system prompts, enhancing role-play implementation and condition-setting for chatbots.
Long-context Support up to 128K tokens and can generate up to 8K tokens.
Multilingual support for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.

Performance Reference

Device

Backend

Precision

TTFT

Prefill

Decode

Context Size

File Size

Model Resource Acquisition

Model Farm provides optimized model resources and test code, which can be obtained through the following two methods:

Obtain via Model Farm page: Click Models & Test Code in the Performance Reference section on the right to obtain model resources and code packages.
Obtain via command line (Recommand): Users with APLUX development boards can obtain model resources and code packages through the built-in MMS tool.

# Search Models
mms list [model name]

# Get Models
mms get -m [model name] -p [precision] -c [soc] -b [backend] -d [file path]

For MMS usage, please refer to: MMS Usage & Access to Preview Models

Model Details

Type: Causal Language Models
Training Stage: Pretraining & Post-training
Architecture: transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias and tied word embeddings
Number of Parameters: 0.49B
Number of Paramaters (Non-Embedding): 0.36B
Number of Layers: 24
Number of Attention Heads (GQA): 14 for Q and 2 for KV
Context Length: Full 32,768 tokens and generation 8192 tokens

For more details, please refer to our blog, GitHub, and Documentation.

Source Model Evaluation

Note: This table showed source model instead of quantized model evaluation. Source Model Evaluation refer to Qwen2.5-0.5B-Instruct Evaluation Result

Datasets	Qwen2-0.5B-Instruct	Qwen2.5-0.5B-Instruct	Qwen2-1.5B-Instruct	Qwen2.5-1.5B-Instruct
MMLU-Pro	14.4	15.0	22.9	32.4
MMLU-redux	12.9	24.1	41.2	50.7
GPQA	23.7	29.8	21.2	29.8
MATH	13.9	34.4	25.3	55.2
GSM8K	40.1	49.6	61.6	73.2
HumanEval	31.1	35.4	42.1	61.6
MBPP	39.7	49.6	44.2	63.2
MultiPL-E	20.8	28.5	38.5	50.4
LiveCodeBench 2305-2409	1.6	5.1	4.5	14.8
LiveBench 0831	7.4	12.6	12.4	18.8
IFeval strict-prompt	14.6	27.9	29.0	42.5

Model Inference

Users can run large language models on Qualcomm chips using either of the following methods:

Run large models with APLUX AidGen: Please refer to the APLUX AidGen Developer Documentation
Run large models with Qualcomm Genie: Please refer to the Qualcomm Genie Documentation

License

Source Model:APACHE-2.0

Deployable Model:APLUX-MODEL-FARM-LICENSE