Fireworks RFT now available! Fine-tune open models that outperform frontier models. Try today

Model Library
/Meta/Llama 4 Scout Instruct (Basic)
Meta Mark

Llama 4 Scout Instruct (Basic)

Ready
fireworks/llama4-scout-instruct-basic

    The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.

    Llama 4 Scout Instruct (Basic) API Features

    Fine-tuning

    Docs

    Llama 4 Scout Instruct (Basic) can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model

    On-demand Deployment

    Docs

    On-demand deployments give you dedicated GPUs for Llama 4 Scout Instruct (Basic) using Fireworks' reliable, high-performance system with no rate limits.

    Llama 4 Scout Instruct (Basic) FAQs

    What is Llama 4 Scout Instruct (Basic) and who developed it?

    Llama 4 Scout Instruct (Basic) is a multimodal mixture-of-experts model developed by Meta, optimized for both text and image inputs. It is part of the Llama 4 model family, which includes Scout (17Bx16E) and Maverick (17Bx128E) variants.

    What applications and use cases does Llama 4 Scout Instruct (Basic) excel at?

    This model excels in:

    • Multilingual instruction-following
    • Visual reasoning and image understanding
    • Document QA and chart interpretation
    • Coding (LiveCodeBench)
    • Agentic decision-making and conversational AI

    It is particularly strong in applications requiring image-text understanding, long context processing, and multilingual capabilities across 12 supported languages.

    What is the maximum context length for Llama 4 Scout Instruct (Basic)?

    The model supports a context length of 1,048.6k tokens, or approximately 1 million tokens.

    What is the usable context window for Llama 4 Scout Instruct (Basic)?

    The model has been tested on long-context tasks (e.g., full-book translation benchmarks) and is capable of maintaining high performance with up to 1M token inputs.

    Does Llama 4 Scout Instruct (Basic) support quantized formats (4-bit/8-bit)?

    Yes. The model is available in BF16 format and supports on-the-fly 4-bit quantization, allowing single-GPU (H100) deployment without major performance loss.

    What are known failure modes of Llama 4 Scout Instruct (Basic)?

    Documented limitations include:

    • Slight degradation in refusal behavior on edge cases
    • Some limitations on cyber-attack and safety prompt handling
    • Model tested with up to 5 images—performance beyond this is not guaranteed
    • Output may still include occasional templated or repetitive language if not prompted carefully
    Does Llama 4 Scout Instruct (Basic) support function-calling schemas?

    Yes, function calling is supported.

    How many parameters does Llama 4 Scout Instruct (Basic) have?
    • Activated Parameters: 17B
    • Total Parameters: 108.6B (MoE architecture)
    Is fine-tuning supported for Llama 4 Scout Instruct (Basic)?

    Full fine-tuning is supported, but LoRA (serverless) is not supported.

    How are tokens counted (prompt vs completion)?

    Fireworks uses per-token billing (input + output). Pricing for this model is $0.15 per 1M tokens (input) and $0.60 per 1M tokens (output) on serverless deployment.

    What rate limits apply on the shared endpoint?

    The model is available on both serverless with rate limits based on system load, and on-demand deployments with no rate limits.

    What license governs commercial use of Llama 4 Scout Instruct (Basic)?

    The model is governed by the Llama 4 Community License, which allows commercial and research use. Full license: GitHub - Llama 4 License.

    Metadata

    State
    Ready
    Created on
    4/5/2025
    Kind
    Base model
    Provider
    Meta
    Hugging Face
    Llama-4-Scout-17B-16E-Instruct

    Specification

    Calibrated
    Yes
    Mixture-of-Experts
    Yes
    Parameters
    108.6B

    Supported Functionality

    Fine-tuning
    Supported
    Serverless
    Not supported
    Serverless LoRA
    Not supported
    Context Length
    1048.6k tokens
    Function Calling
    Supported
    Embeddings
    Not supported
    Rerankers
    Not supported
    Support image input
    Supported