Fireworks RFT now available! Fine-tune open models that outperform frontier models. Try today

Model Library
/Qwen/Qwen3 32B
fireworks/qwen3-32b

    Latest Qwen3 state of the art model, 32B model

    Qwen3 32B API Features

    Fine-tuning

    Docs

    Qwen3 32B can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model

    On-demand Deployment

    Docs

    On-demand deployments give you dedicated GPUs for Qwen3 32B using Fireworks' reliable, high-performance system with no rate limits.

    Qwen3 32B FAQs

    What is Qwen3 32B and who developed it?

    Qwen3 32B is a 32.8 billion parameter base language model developed by Qwen (Alibaba Group). It is part of the third-generation Qwen series, which introduces a dual-mode architecture (thinking vs. non-thinking) for improved performance in reasoning, coding, and dialogue tasks.

    What applications and use cases does Qwen3 32B excel at?

    The model is optimized for:

    • Conversational AI
    • Code assistance
    • Agentic systems
    • Enterprise RAG
    • Search and multimedia reasoning

    It supports both general-purpose dialogue and complex logical tasks.

    What is the maximum context length for Qwen3 32B?

    The model supports a native context length of 32,768 tokens, extendable to 131,072 tokens using YaRN (rope scaling).

    What is the usable context window for Qwen3 32B?

    Fireworks supports the full 131.1K token window on on-demand deployments.

    What is the default temperature of Qwen3 32B on Fireworks AI?

    Thinking mode uses temperature=0.6, top_p=0.95, and top_k=20.

    Non-thinking mode uses temperature=0.7, top_p=0.8, and top_k=20.

    Greedy decoding is discouraged to avoid repetition and degraded performance.

    What is the maximum output length Fireworks allows for Qwen3 32B?

    The recommended output length is up to 32,768 tokens, with a maximum of 38,912 tokens for complex benchmarks (e.g., code/math reasoning)

    What are known failure modes of Qwen3 32B?
    • Performance degradation on short prompts when YaRN is enabled
    • Framework compatibility issues with transformers < v4.51.0
    • No support for image inputs, embeddings, or rerankers
    • Tool calling must be explicitly configured (e.g., via Qwen-Agent)
    Does Qwen3 32B support streaming responses and function-calling schemas?
    • Streaming: Not support
    • Function calling: Supported
    How many parameters does Qwen3 32B have?
    • Total parameters: 32.8B
    • Non-embedding parameters: 31.2B
    • Architecture: 64 layers, GQA with 64 query heads and 8 KV heads
    Is fine-tuning supported for Qwen3 32B?

    Yes. Fireworks supports LoRA-based fine-tuning on dedicated GPU deployments.

    What rate limits apply on the shared endpoint?
    • Serverless: Not supported
    • On-demand: Available with no rate limits
    What license governs commercial use of Qwen3 32B?

    Qwen3 32B is released under the Apache 2.0 license, which permits unrestricted commercial use.

    Metadata

    State
    Ready
    Created on
    4/28/2025
    Kind
    Base model
    Provider
    Qwen
    Hugging Face
    Qwen3-32B

    Specification

    Calibrated
    Yes
    Mixture-of-Experts
    No
    Parameters
    32.8B

    Supported Functionality

    Fine-tuning
    Supported
    Serverless
    Not supported
    Serverless LoRA
    Not supported
    Context Length
    131.1k tokens
    Function Calling
    Supported
    Embeddings
    Not supported
    Rerankers
    Not supported
    Support image input
    Not supported