Fireworks RFT now available! Fine-tune open models that outperform frontier models. Try today

Model Library
/Qwen/Qwen3 Embedding 8B
Quen Logo Mark

Qwen3 Embedding 8B

Ready
fireworks/qwen3-embedding-8b

    The Qwen3 Embedding 8B model is the latest proprietary model of the Qwen family, specifically designed for text embedding tasks. This model inherits the exceptional multilingual capabilities, long-text understanding, and reasoning skills building upon the dense foundational models of the Qwen3 series. The model represents significant advancements in multiple text embedding tasks including text retrieval, code retrieval, text classification, text clustering.

    Qwen3 Embedding 8B API Features

    Serverless

    Docs

    Immediately run model on pre-configured GPUs and pay-per-token

    On-demand Deployment

    Docs

    On-demand deployments give you dedicated GPUs for Qwen3 Embedding 8B using Fireworks' reliable, high-performance system with no rate limits.

    0

    Qwen3 Embedding 8B FAQs

    What is Qwen3 Embedding 8B and who developed it?

    Qwen3 Embedding 8B is a proprietary text embedding model developed by Qwen (a sub-brand of Alibaba Group). It is part of the Qwen3 Embedding series and is optimized for multilingual embedding, retrieval, and reranking tasks.

    What applications and use cases does Qwen3 Embedding 8B excel at?

    The model is designed for:

    • Text and code retrieval
    • Text classification and clustering
    • Bitext mining
    • Multilingual and cross-lingual similarity tasks
    What is the maximum context length for Qwen3 Embedding 8B?

    The model supports a context length of 32,000 tokens.

    What is the usable context window for Qwen3 Embedding 8B?

    The full 32K token context length is usable as supported by the model architecture.

    Does Qwen3 Embedding 8B support quantized formats (4-bit/8-bit)?

    Yes, the model supports 4-bit and 8-bit formats.

    What embedding dimensions does Qwen3 Embedding 8B support?

    The model supports embedding dimensions from 32 to 4096, configurable by the user.

    What are known failure modes of Qwen3 Embedding 8B?

    Qwen notes:

    • Retrieval performance drops by 1–5% without instruction prompts
    • Instruction format should be tailored and ideally written in English for best results
    • Earlier transformers versions (<4.51.0) may cause compatibility errors
    Does Qwen3 Embedding 8B support streaming responses and function-calling schemas?

    No, streaming and function calling are not supported.

    How many parameters does Qwen3 Embedding 8B have?

    The model has 8 billion parameters.

    Is fine-tuning supported for Qwen3 Embedding 8B?

    No. Fine-tuning is not supported on Fireworks for this model.

    What rate limits apply on the shared endpoint?

    The model is available via serverless deployment with pay-per-token billing, as well as on-demand deployment with no rate limits using dedicated GPUs.

    What license governs commercial use of Qwen3 Embedding 8B?

    The model is licensed under the Apache 2.0 license, allowing unrestricted commercial use.

    Metadata

    State
    Ready
    Created on
    8/20/2025
    Kind
    Kind 10
    Provider
    Qwen
    Hugging Face
    Qwen3-Embedding-8B

    Specification

    Calibrated
    No
    Mixture-of-Experts
    No
    Parameters
    8.2B

    Supported Functionality

    Fine-tuning
    Not supported
    Serverless
    Supported
    Serverless LoRA
    Not supported
    Context Length
    41k tokens
    Function Calling
    Not supported
    Embeddings
    Not supported
    Rerankers
    Not supported
    Support image input
    Not supported