gemma-3-1b-it
Text Generation
W4A16
post
gemma-3-1b-it

Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. Gemma 3 models are multimodal, handling text and image input and generating text output, with open weights for both pre-trained variants and instruction-tuned variants. Gemma 3 has a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions. Gemma 3 models are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as laptops, desktops or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone.

Performance Reference

Device

Backend
Precision
TTFT
Prefill
Decode
Context Size
File Size
Model Resource Acquisition

Model Farm provides optimized model resources and test code, which can be obtained through the following two methods:

  • Obtain via Model Farm page: Click Models & Test Code in the Performance Reference section on the right to obtain model resources and code packages.

  • Obtain via command line (Recommand): Users with APLUX development boards can obtain model resources and code packages through the built-in MMS tool.

# Search Models
mms list [model name]

# Get Models
mms get -m [model name] -p [precision] -c [soc] -b [backend] -d [file path]

For MMS usage, please refer to: MMS Usage & Access to Preview Models

Model Details

Inputs and outputs

  • Input:

    • Text string, such as a question, a prompt, or a document to be summarized
    • Images, normalized to 896 x 896 resolution and encoded to 256 tokens each
    • Total input context of 128K tokens for the 4B, 12B, and 27B sizes, and 32K tokens for the 1B size
  • Output:

    • Generated text in response to the input, such as an answer to a question, analysis of image content, or a summary of a document
    • Total output context of 8192 tokens
Source Model Evaluation

Note: This table showed source model instead of quantized model evaluation. Source Model Evaluation refer to gemma-3-1b-it Evaluation Result

Benchmark Metric Gemma 3 PT 1B Gemma 3 PT 4B Gemma 3 PT 12B Gemma 3 PT 27B
[HellaSwag][hellaswag] 10-shot 62.3 77.2 84.2 85.6
[BoolQ][boolq] 0-shot 63.2 72.3 78.8 82.4
[PIQA][piqa] 0-shot 73.8 79.6 81.8 83.3
[SocialIQA][socialiqa] 0-shot 48.9 51.9 53.4 54.9
[TriviaQA][triviaqa] 5-shot 39.8 65.8 78.2 85.5
[Natural Questions][naturalq] 5-shot 9.48 20.0 31.4 36.1
[ARC-c][arc] 25-shot 38.4 56.2 68.9 70.6
[ARC-e][arc] 0-shot 73.0 82.4 88.3 89.0
[WinoGrande][winogrande] 5-shot 58.2 64.7 74.3 78.8
[BIG-Bench Hard][bbh] few-shot 28.4 50.9 72.6 77.7
[DROP][drop] 1-shot 42.4 60.1 72.2 77.2
Model Inference

Users can run large language models on Qualcomm chips using either of the following methods:

License
Source Model:Gemma-LICENSE
Deployable Model:Gemma-LICENSE
Performance Reference

Device

Backend
Precision
TTFT
Prefill
Decode
Context Size
File Size