The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.
Fine-tuningDocs | Llama 4 Scout Instruct (Basic) can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model |
On-demand DeploymentDocs | On-demand deployments give you dedicated GPUs for Llama 4 Scout Instruct (Basic) using Fireworks' reliable, high-performance system with no rate limits. |
Llama 4 Scout Instruct (Basic) is a multimodal mixture-of-experts model developed by Meta, optimized for both text and image inputs. It is part of the Llama 4 model family, which includes Scout (17Bx16E) and Maverick (17Bx128E) variants.
This model excels in:
It is particularly strong in applications requiring image-text understanding, long context processing, and multilingual capabilities across 12 supported languages.
The model supports a context length of 1,048.6k tokens, or approximately 1 million tokens.
The model has been tested on long-context tasks (e.g., full-book translation benchmarks) and is capable of maintaining high performance with up to 1M token inputs.
Yes. The model is available in BF16 format and supports on-the-fly 4-bit quantization, allowing single-GPU (H100) deployment without major performance loss.
Documented limitations include:
Yes, function calling is supported.
Full fine-tuning is supported, but LoRA (serverless) is not supported.
Fireworks uses per-token billing (input + output). Pricing for this model is $0.15 per 1M tokens (input) and $0.60 per 1M tokens (output) on serverless deployment.
The model is governed by the Llama 4 Community License, which allows commercial and research use. Full license: GitHub - Llama 4 License.