DeepSeek V4 Pro is Live → Try it now.

Model Library
/Qwen/Qwen3 Embedding 8B
Quen Logo Mark

Qwen3 Embedding 8B

Ready
accounts/fireworks/models/qwen3-embedding-8b

    The Qwen3 Embedding 8B model is the latest proprietary model of the Qwen family, specifically designed for text embedding tasks. This model inherits the exceptional multilingual capabilities, long-text understanding, and reasoning skills building upon the dense foundational models of the Qwen3 series. The model represents significant advancements in multiple text embedding tasks including text retrieval, code retrieval, text classification, text clustering.

    Qwen3 Embedding 8B API Features

    Serverless

    Docs

    Qwen3 Embedding 8B is available via Fireworks' serverless API, where you pay per token. There are several ways to call the Fireworks API, including Fireworks' Python client, the REST API, or OpenAI's Python client.

    On-demand Deployment

    Docs

    On-demand deployments allow you to use Qwen3 Embedding 8B on dedicated GPUs with Fireworks' high-performance serving stack with high reliability and no rate limits.

    Available Serverless

    Run queries immediately, pay only for usage

    $0.00
    Per 1M Tokens

    Qwen3 Embedding 8B FAQs

    What is Qwen3 Embedding 8B and who developed it?

    Qwen3 Embedding 8B is a proprietary text embedding model developed by Qwen (a sub-brand of Alibaba Group). It is part of the Qwen3 Embedding series and is optimized for multilingual embedding, retrieval, and reranking tasks.

    What applications and use cases does Qwen3 Embedding 8B excel at?

    The model is designed for:

    • Text and code retrieval
    • Text classification and clustering
    • Bitext mining
    • Multilingual and cross-lingual similarity tasks
    What is the maximum context length for Qwen3 Embedding 8B?

    The model supports a context length of 32,000 tokens.

    What is the usable context window for Qwen3 Embedding 8B?

    The full 32K token context length is usable as supported by the model architecture.

    Does Qwen3 Embedding 8B support quantized formats (4-bit/8-bit)?

    Yes, the model supports 4-bit and 8-bit formats.

    What embedding dimensions does Qwen3 Embedding 8B support?

    The model supports embedding dimensions from 32 to 4096, configurable by the user.

    What are known failure modes of Qwen3 Embedding 8B?

    Qwen notes:

    • Retrieval performance drops by 1–5% without instruction prompts
    • Instruction format should be tailored and ideally written in English for best results
    • Earlier transformers versions (<4.51.0) may cause compatibility errors
    Does Qwen3 Embedding 8B support streaming responses and function-calling schemas?

    No, streaming and function calling are not supported.

    How many parameters does Qwen3 Embedding 8B have?

    The model has 8 billion parameters.

    Is fine-tuning supported for Qwen3 Embedding 8B?

    No. Fine-tuning is not supported on Fireworks for this model.

    What rate limits apply on the shared endpoint?

    The model is available via serverless deployment with pay-per-token billing, as well as on-demand deployment with no rate limits using dedicated GPUs.

    What license governs commercial use of Qwen3 Embedding 8B?

    The model is licensed under the Apache 2.0 license, allowing unrestricted commercial use.

    Metadata

    State
    Ready
    Created on
    8/20/2025
    Kind
    Embedding model
    Provider
    Qwen

    Specification

    Calibrated
    No
    Mixture-of-Experts
    No
    Parameters
    8.18B

    Supported Functionality

    Fine-tuning
    Not supported
    Serverless
    Supported
    Context Length
    40.9k tokens
    Function Calling
    Not supported
    Embeddings
    Not supported
    Rerankers
    Not supported
    Support image input
    Not supported