GLM 5.2 is live! Opus-level intelligence at open-source rates. Pay per token on serverless. Try it today.

Model Library
/StepFun/Step-3.7-Flash-NVFP4
StepFun

Step-3.7-Flash-NVFP4

Ready
model path:accounts/fireworks/models/step-3p7-flash-nvfp4

Step 3.7 Flash is a 198B-parameter sparse Mixture-of-Experts (MoE) vision-language model that combines a 196B-parameter language backbone with a 1.8B-parameter vision encoder for native image understanding. Engineered for high-frequency production workloads, it activates approximately 11B parameters per token and delivers a throughput of up to 400 tokens per second. Step 3.7 Flash supports a 256k context window and offers three selectable reasoning levels (low, medium, and high) so developers can easily balance speed, cost, and cognitive depth.

Step-3.7-Flash-NVFP4 API Features

On-demand Deployment

Docs

On-demand deployments allow you to use Step-3.7-Flash-NVFP4 on dedicated GPUs with Fireworks' high-performance serving stack with high reliability and no rate limits.

Metadata

State
Ready
Created on
5/31/2026
Kind
Base model
Provider
StepFun

Specification

Calibrated
No
Mixture-of-Experts
Yes
Parameters
198B

Supported Functionality

Fine-tuning
Not supported
Serverless
Not supported
Context Length
262k tokens
Function Calling
Supported
Embeddings
Not supported
Rerankers
Not supported
Support image input
Supported