Qwen 3.7 Plus is now available on Serverless, exclusively on Fireworks. Try it today.

Model Library
/Deepseek/DeepSeek V3
model path:accounts/fireworks/models/deepseek-v3

A a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token from Deepseek. Note that fine-tuning for this model is only available through contacting fireworks at https://fireworks.ai/company/contact-us.

DeepSeek V3 API Features

Fine-tuning

Docs

DeepSeek V3 can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model

On-demand Deployment

Docs

On-demand deployments allow you to use DeepSeek V3 on dedicated GPUs with Fireworks' high-performance serving stack with high reliability and no rate limits.

DeepSeek V3 FAQs

What is DeepSeek V3 and who developed it?

DeepSeek V3 is a Mixture-of-Experts (MoE) large language model developed by DeepSeek AI. It has 671B total parameters, with 37B activated per token during inference. The model uses Multi-head Latent Attention (MLA) and a Multi-Token Prediction (MTP) objective to improve inference speed and training efficiency.

What applications and use cases does DeepSeek V3 excel at?

DeepSeek V3 is well-suited for:

  • Complex reasoning and chain-of-thought tasks
  • Code generation and structured output
  • Math and logic benchmarks (e.g., MATH, GSM8K, HumanEval)
  • Multilingual understanding (e.g., C-Eval, CMMLU)
  • Vision tasks when used with Fireworks’ Document Inlining, which allows uploading images and PDFs by appending URLs with #transform=inline
What is the maximum context length for DeepSeek V3?

DeepSeek V3 supports a context length of 131,072 tokens.

What is the usable context window for DeepSeek V3?

The model maintains high accuracy across the 128K token context window, validated through Needle-in-a-Haystack (NIAH) benchmarks.

Does DeepSeek V3 support quantized formats?

Yes. DeepSeek V3 supports INT4, INT8, and FP8 formats. Fireworks also provides Quantization-Aware Training (QAT) to maintain high accuracy in quantized deployments.

What are known failure modes of DeepSeek V3?

Known issues include:

  • Degraded accuracy in multi-turn function calling
  • Performance sensitivity when converting LoRA weights to FP8 during inference

Evaluation limitations are discussed in our function-calling and fine-tuning blog posts.

Does DeepSeek V3 support streaming responses and function-calling schemas?

Yes. DeepSeek V3 supports:

  • Streaming responses
  • Function calling (tool use) in JSON format, available on Fireworks’ Serverless tier. Multi-turn function calling is still an area of improvement.
How many parameters does DeepSeek V3 have?
  • Total: 671B parameters
  • Activated per token: 37B parameters
Is fine-tuning supported for DeepSeek V3?

Yes. Fireworks supports Quantization-Aware Fine-Tuning (QAT) using LoRA and QLoRA for DeepSeek V3. Fine-tuned models can be deployed directly via Fireworks infrastructure.

What license governs commercial use of DeepSeek V3?
  • Code license: MIT
  • Model license: DeepSeek Model Agreement
  • Commercial use is permitted under these terms.

Metadata

State
Ready
Created on
12/30/2024
Kind
Base model
Provider
Deepseek

Specification

Calibrated
Yes
Mixture-of-Experts
Yes
Parameters
671B

Supported Functionality

Fine-tuning
Supported
Serverless
Not supported
Context Length
131k tokens
Function Calling
Supported
Embeddings
Not supported
Rerankers
Not supported
Support image input
Not supported