Serverless 2.0 is live: control reliability & speed without reserved capacity. Get Started.

Model Library
/Fireworks AI/Step-3.7-Flash-NVFP4
Fireworks Logo Mark

Step-3.7-Flash-NVFP4

Ready
accounts/fireworks/models/step-3p7-flash-nvfp4

    Step 3.7 Flash is a 198B-parameter sparse Mixture-of-Experts (MoE) vision-language model that combines a 196B-parameter language backbone with a 1.8B-parameter vision encoder for native image understanding. Engineered for high-frequency production workloads, it activates approximately 11B parameters per token and delivers a throughput of up to 400 tokens per second. Step 3.7 Flash supports a 256k context window and offers three selectable reasoning levels (low, medium, and high) so developers can easily balance speed, cost, and cognitive depth.

    Step-3.7-Flash-NVFP4 API Features

    On-demand Deployment

    Docs

    On-demand deployments allow you to use Step-3.7-Flash-NVFP4 on dedicated GPUs with Fireworks' high-performance serving stack with high reliability and no rate limits.

    Metadata

    State
    Ready
    Created on
    5/31/2026
    Kind
    Base model
    Provider
    Fireworks AI

    Specification

    Calibrated
    No
    Mixture-of-Experts
    Yes
    Parameters
    198B

    Supported Functionality

    Fine-tuning
    Not supported
    Serverless
    Not supported
    Context Length
    262k tokens
    Function Calling
    Supported
    Embeddings
    Not supported
    Rerankers
    Not supported
    Support image input
    Supported