Fireworks RFT now available! Fine-tune open models that outperform frontier models. Try today

Model Library
/Deepseek/DeepSeek V3
fireworks/deepseek-v3

    A a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token from Deepseek. Note that fine-tuning for this model is only available through contacting fireworks at https://fireworks.ai/company/contact-us.

    DeepSeek V3 API Features

    Fine-tuning

    Docs

    DeepSeek V3 can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model

    On-demand Deployment

    Docs

    On-demand deployments give you dedicated GPUs for DeepSeek V3 using Fireworks' reliable, high-performance system with no rate limits.

    DeepSeek V3 FAQs

    What is DeepSeek V3 and who developed it?

    DeepSeek V3 is a Mixture-of-Experts (MoE) large language model developed by DeepSeek AI. It has 671B total parameters, with 37B activated per token during inference. The model uses Multi-head Latent Attention (MLA) and a Multi-Token Prediction (MTP) objective to improve inference speed and training efficiency.

    What applications and use cases does DeepSeek V3 excel at?

    DeepSeek V3 is well-suited for:

    • Complex reasoning and chain-of-thought tasks
    • Code generation and structured output
    • Math and logic benchmarks (e.g., MATH, GSM8K, HumanEval)
    • Multilingual understanding (e.g., C-Eval, CMMLU)
    • Vision tasks when used with Fireworks’ Document Inlining, which allows uploading images and PDFs by appending URLs with #transform=inline
    What is the maximum context length for DeepSeek V3?

    DeepSeek V3 supports a context length of 131,072 tokens.

    What is the usable context window for DeepSeek V3?

    The model maintains high accuracy across the 128K token context window, validated through Needle-in-a-Haystack (NIAH) benchmarks.

    Does DeepSeek V3 support quantized formats?

    Yes. DeepSeek V3 supports INT4, INT8, and FP8 formats. Fireworks also provides Quantization-Aware Training (QAT) to maintain high accuracy in quantized deployments.

    What are known failure modes of DeepSeek V3?

    Known issues include:

    • Degraded accuracy in multi-turn function calling
    • Performance sensitivity when converting LoRA weights to FP8 during inference

    Evaluation limitations are discussed in our function-calling and fine-tuning blog posts.

    Does DeepSeek V3 support streaming responses and function-calling schemas?

    Yes. DeepSeek V3 supports:

    • Streaming responses
    • Function calling (tool use) in JSON format, available on Fireworks’ Serverless tier. Multi-turn function calling is still an area of improvement.
    How many parameters does DeepSeek V3 have?
    • Total: 671B parameters
    • Activated per token: 37B parameters
    Is fine-tuning supported for DeepSeek V3?

    Yes. Fireworks supports Quantization-Aware Fine-Tuning (QAT) using LoRA and QLoRA for DeepSeek V3. Fine-tuned models can be deployed directly via Fireworks infrastructure.

    What license governs commercial use of DeepSeek V3?
    • Code license: MIT
    • Model license: DeepSeek Model Agreement
    • Commercial use is permitted under these terms.

    Metadata

    State
    Ready
    Created on
    12/30/2024
    Kind
    Base model
    Provider
    Deepseek
    Hugging Face
    DeepSeek-V3

    Specification

    Calibrated
    Yes
    Mixture-of-Experts
    Yes
    Parameters
    671B

    Supported Functionality

    Fine-tuning
    Supported
    Serverless
    Not supported
    Serverless LoRA
    Not supported
    Context Length
    131.1k tokens
    Function Calling
    Supported
    Embeddings
    Not supported
    Rerankers
    Not supported
    Support image input
    Not supported