The Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks.
Fine-tuningDocs | Llama 3.2 3B Instruct can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model |
On-demand DeploymentDocs | On-demand deployments give you dedicated GPUs for Llama 3.2 3B Instruct using Fireworks' reliable, high-performance system with no rate limits. |
Llama 3.2 3B Instruct is an instruction-tuned, multilingual large language model developed by Meta. It belongs to the Llama 3.2 family of models optimized for assistant-style dialogue, summarization, and retrieval use cases. The 3B variant includes approximately 3.21 billion parameters.
This model is optimized for:
The maximum context length is 131,072 tokens (131.1k).
Yes. Quantized variants are available in 4-bit and 8-bit formats.
Known risks include:
Meta recommends pairing the model with system-level safeguards like Llama Guard or Prompt Guard. Safety red-teaming covered areas such as CBRNE threats, child safety, and cyberattacks.
Streaming and function calling are not supported for this model.
The model has 3.21 billion parameters.
Yes. Fireworks supports fine-tuning with LoRA for this model.
Use of the model is governed by the Llama 3.2 Community License, which permits commercial use under specific terms set by Meta.