GLM 5.2 is live! Opus-level intelligence at open-source rates. Pay per token on serverless. Try it today.

Model Library
/Qwen/Qwen3 Embedding 8B
Quen Logo Mark

Qwen3 Embedding 8B

Ready
model path:accounts/fireworks/models/qwen3-embedding-8b

The Qwen3 Embedding 8B model is the latest proprietary model of the Qwen family, specifically designed for text embedding tasks. This model inherits the exceptional multilingual capabilities, long-text understanding, and reasoning skills building upon the dense foundational models of the Qwen3 series. The model represents significant advancements in multiple text embedding tasks including text retrieval, code retrieval, text classification, text clustering.

Qwen3 Embedding 8B API Features

Serverless

Docs

Qwen3 Embedding 8B is available via Fireworks' serverless API, where you pay per token. There are several ways to call the Fireworks API, including Fireworks' Python client, the REST API, or OpenAI's Python client.

On-demand Deployment

Docs

On-demand deployments allow you to use Qwen3 Embedding 8B on dedicated GPUs with Fireworks' high-performance serving stack with high reliability and no rate limits.

Available Serverless

Run queries immediately, pay only for usage

$0.00
Per 1M Tokens

Qwen3 Embedding 8B FAQs

What is Qwen3 Embedding 8B and who developed it?

Qwen3 Embedding 8B is a proprietary text embedding model developed by Qwen (a sub-brand of Alibaba Group). It is part of the Qwen3 Embedding series and is optimized for multilingual embedding, retrieval, and reranking tasks.

What applications and use cases does Qwen3 Embedding 8B excel at?

The model is designed for:

  • Text and code retrieval
  • Text classification and clustering
  • Bitext mining
  • Multilingual and cross-lingual similarity tasks
What is the maximum context length for Qwen3 Embedding 8B?

The model supports a context length of 32,000 tokens.

What is the usable context window for Qwen3 Embedding 8B?

The full 32K token context length is usable as supported by the model architecture.

Does Qwen3 Embedding 8B support quantized formats (4-bit/8-bit)?

Yes, the model supports 4-bit and 8-bit formats.

What embedding dimensions does Qwen3 Embedding 8B support?

The model supports embedding dimensions from 32 to 4096, configurable by the user.

What are known failure modes of Qwen3 Embedding 8B?

Qwen notes:

  • Retrieval performance drops by 1–5% without instruction prompts
  • Instruction format should be tailored and ideally written in English for best results
  • Earlier transformers versions (<4.51.0) may cause compatibility errors
Does Qwen3 Embedding 8B support streaming responses and function-calling schemas?

No, streaming and function calling are not supported.

How many parameters does Qwen3 Embedding 8B have?

The model has 8 billion parameters.

Is fine-tuning supported for Qwen3 Embedding 8B?

No. Fine-tuning is not supported on Fireworks for this model.

What rate limits apply on the shared endpoint?

The model is available via serverless deployment with pay-per-token billing, as well as on-demand deployment with no rate limits using dedicated GPUs.

What license governs commercial use of Qwen3 Embedding 8B?

The model is licensed under the Apache 2.0 license, allowing unrestricted commercial use.

Metadata

State
Ready
Created on
8/20/2025
Kind
Embedding model
Provider
Qwen

Specification

Calibrated
No
Mixture-of-Experts
No
Parameters
8.18B

Supported Functionality

Fine-tuning
Not supported
Serverless
Supported
Context Length
40.9k tokens
Function Calling
Not supported
Embeddings
Not supported
Rerankers
Not supported
Support image input
Not supported