DeepSeek-V4-Flash API & Playground

DeepSeek-V4-Flash is a streamlined open-source Mixture-of-Experts model optimized for fast, cost-efficient inference while preserving strong reasoning and coding performance at 1M token context scale. It leverages the same hybrid attention innovations as Pro but is tuned for lower latency and higher throughput in real-time applications. It delivers near-Pro reasoning quality under sufficient compute budget, making it ideal for interactive agents and high-volume production workloads.

DeepSeek-V4-Flash API Features

Fine-tuning Docs	DeepSeek-V4-Flash can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model
Serverless Docs	DeepSeek-V4-Flash is available via Fireworks' serverless API, where you pay per token. There are several ways to call the Fireworks API, including Fireworks' Python client, the REST API, or OpenAI's Python client.
On-demand Deployment Docs	On-demand deployments allow you to use DeepSeek-V4-Flash on dedicated GPUs with Fireworks' high-performance serving stack with high reliability and no rate limits.

Available Serverless

Run queries immediately, pay only for usage

$0.14 / $0.028 / $0.28

Per 1M Tokens (input/cached input/output)

Metadata

State

Ready

Created on

4/24/2026

Kind

Base model

Provider

Deepseek

Hugging Face

deepseek-ai/DeepSeek-V4-Flash

Specification

Calibrated

Mixture-of-Experts

Yes

Parameters

284B

Supported Functionality

Fine-tuning

Supported

Serverless

Supported

Context Length

1040k tokens

Function Calling

Supported

Embeddings

Not supported

Rerankers

Not supported

Support image input

Not supported

DeepSeek-V4-Flash

DeepSeek-V4-Flash API Features

Fine-tuning

Serverless

On-demand Deployment

Available Serverless

Metadata

Specification

Supported Functionality