The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.
Fine-tuningDocs | Llama 4 Maverick Instruct (Basic) can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model |
On-demand DeploymentDocs | On-demand deployments give you dedicated GPUs for Llama 4 Maverick Instruct (Basic) using Fireworks' reliable, high-performance system with no rate limits. |
Llama 4 Maverick Instruct is the instruction-tuned variant of Llama 4 Maverick, a 17 billion-parameter mixture-of-experts (128 experts, 400B total parameters) multimodal model created by Meta.
Llama 4 Maverick Instruct is designed for:
Llama 4 Maverick Instruct has a maximum context length of 1,048,576 tokens.
Yes. Meta releases Maverick in BF16 and FP8 checkpoints and provides on-the-fly int4 (≈ 4-bit) quantization code; this enables single-GPU deployment.
Llama 4 Maverick Instruct has 17 billion active parameters (one expert per token) within a 128-expert MoE totaling 400 billion parameters.
Yes. Fine-tuning is available for Llama 4 Maverick Instruct on Fireworks via LoRA.
The model is released under the Llama 4 Community License Agreement (custom commercial license by Meta).