Step-3.7-Flash-NVFP4 API & Playground

Step 3.7 Flash is a 198B-parameter sparse Mixture-of-Experts (MoE) vision-language model that combines a 196B-parameter language backbone with a 1.8B-parameter vision encoder for native image understanding. Engineered for high-frequency production workloads, it activates approximately 11B parameters per token and delivers a throughput of up to 400 tokens per second. Step 3.7 Flash supports a 256k context window and offers three selectable reasoning levels (low, medium, and high) so developers can easily balance speed, cost, and cognitive depth.

Step-3.7-Flash-NVFP4 API Features

On-demand Deployment

Docs

On-demand deployments allow you to use Step-3.7-Flash-NVFP4 on dedicated GPUs with Fireworks' high-performance serving stack with high reliability and no rate limits.

Metadata

State

Ready

Created on

5/31/2026

Kind

Base model

Provider

N/A

Hugging Face

stepfun-ai/Step-3.7-Flash-NVFP4

Specification

Calibrated

Mixture-of-Experts

Yes

Parameters

198B

Supported Functionality

Fine-tuning

Not supported

Serverless

Not supported

Context Length

262k tokens

Function Calling

Supported

Embeddings

Not supported

Rerankers

Not supported

Support image input

Supported