
Phi-3.5-mini is a lightweight, state-of-the-art open model built upon datasets used for Phi-3 - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data. The model belongs to the Phi-3 model family and supports 128K token context length. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning, proximal policy optimization, and direct preference optimization to ensure precise instruction adherence and robust safety measures.
Primary Use Cases
The model is intended for commercial and research use in multiple languages. The model provides uses for general purpose AI systems and applications which require:
- Memory/compute constrained environments
- Latency bound scenarios
- Strong reasoning (especially code, math and logic)
Model is designed to accelerate research on language and multimodal models, for use as a building block for generative AI powered features.
Use Case Considerations
Models are not specifically designed or evaluated for all downstream purposes. Developers should consider common limitations of language models as they select use cases, and evaluate and mitigate for accuracy, safety, and fariness before using within a specific downstream use case, particularly for high risk scenarios. Developers should be aware of and adhere to applicable laws or regulations (including privacy, trade compliance laws, etc.) that are relevant to their use case.
Nothing contained in this Model Card should be interpreted as or deemed a restriction or modification to the license the model is released under.
Release Notes
This is an update over the June 2024 instruction-tuned Phi-3 Mini release based on valuable user feedback. The model used additional post-training data leading to substantial gains on multilingual, multi-turn conversation quality, and reasoning capability. We believe most use cases will benefit from this release, but encourage users to test in their particular AI applications.
Note: This table showed source model instead of quantized model evaluation. Source Model Evaluation refer to Phi-3.5-mini-instruct Evaluation Result
Benchmark | Phi-3.5 Mini-Ins | Phi-3.0-Mini-128k-Instruct (June2024) | Mistral-7B-Instruct-v0.3 | Mistral-Nemo-12B-Ins-2407 | Llama-3.1-8B-Ins | Gemma-2-9B-Ins | Gemini 1.5 Flash | GPT-4o-mini-2024-07-18 (Chat) |
---|---|---|---|---|---|---|---|---|
Multilingual MMLU | 55.4 | 51.08 | 47.4 | 58.9 | 56.2 | 63.8 | 77.2 | 72.9 |
Multilingual MMLU-Pro | 30.9 | 30.21 | 15.0 | 34.0 | 21.4 | 43.0 | 57.9 | 53.2 |
MGSM | 47.9 | 41.56 | 31.8 | 63.3 | 56.7 | 75.1 | 75.8 | 81.7 |
MEGA MLQA | 61.7 | 55.5 | 43.9 | 61.2 | 45.2 | 54.4 | 61.6 | 70.0 |
MEGA TyDi QA | 62.2 | 55.9 | 54.0 | 63.7 | 54.5 | 65.6 | 63.6 | 81.8 |
MEGA UDPOS | 46.5 | 48.1 | 57.2 | 58.2 | 54.1 | 56.6 | 62.4 | 66.0 |
MEGA XCOPA | 63.1 | 62.4 | 58.8 | 10.8 | 21.1 | 31.2 | 95.0 | 90.3 |
MEGA XStoryCloze | 73.5 | 73.6 | 75.5 | 92.3 | 71.0 | 87.0 | 20.7 | 96.6 |
Average | 55.2 | 52.3 | 47.9 | 55.3 | 47.5 | 59.6 | 64.3 | 76.6 |
Users can run large language models on Qualcomm chips using either of the following methods:
Run large models with APLUX AidGen: Please refer to the APLUX AidGen Developer Documentation
Run large models with Qualcomm Genie: Please refer to the Qualcomm Genie Documentation