A a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token from Deepseek. Note that fine-tuning for this model is only available through contacting fireworks at https://fireworks.ai/company/contact-us.
Fine-tuningDocs | DeepSeek V3 can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model |
On-demand DeploymentDocs | On-demand deployments give you dedicated GPUs for DeepSeek V3 using Fireworks' reliable, high-performance system with no rate limits. |
DeepSeek V3 is a Mixture-of-Experts (MoE) large language model developed by DeepSeek AI. It has 671B total parameters, with 37B activated per token during inference. The model uses Multi-head Latent Attention (MLA) and a Multi-Token Prediction (MTP) objective to improve inference speed and training efficiency.
DeepSeek V3 is well-suited for:
DeepSeek V3 supports a context length of 131,072 tokens.
The model maintains high accuracy across the 128K token context window, validated through Needle-in-a-Haystack (NIAH) benchmarks.
Yes. DeepSeek V3 supports INT4, INT8, and FP8 formats. Fireworks also provides Quantization-Aware Training (QAT) to maintain high accuracy in quantized deployments.
Known issues include:
Evaluation limitations are discussed in our function-calling and fine-tuning blog posts.
Yes. DeepSeek V3 supports:
Yes. Fireworks supports Quantization-Aware Fine-Tuning (QAT) using LoRA and QLoRA for DeepSeek V3. Fine-tuned models can be deployed directly via Fireworks infrastructure.