Model Availability
Fireworks hosts several purpose-built embeddings models, which are optimized specifically for tasks like semantic search and document similarity comparison. We host the SOTA Qwen3 Embeddings family of models:fireworks/qwen3-embedding-8b(*available on serverless)fireworks/qwen3-embedding-4bfireworks/qwen3-embedding-0p6b
Use any LLM as an embeddings model
Use any LLM as an embeddings model
You can retrieve embeddings from any LLM in our model library. Here are some examples of LLMs that work with the embeddings API:
fireworks/glm-4p5fireworks/gpt-oss-20bfireworks/kimi-k2-instruct-0905fireworks/deepseek-r1-0528
Bring your own model
Bring your own model
You can also retrieve embeddings from any models you bring yourself through custom model upload.
BERT-based models (legacy)
BERT-based models (legacy)
These BERT-based models are available on serverless only:
nomic-ai/nomic-embed-text-v1.5nomic-ai/nomic-embed-text-v1WhereIsAI/UAE-Large-V1thenlper/gte-largethenlper/gte-baseBAAI/bge-base-en-v1.5BAAI/bge-small-en-v1.5mixedbread-ai/mxbai-embed-large-v1sentence-transformers/all-MiniLM-L6-v2sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
Generating embeddings
Embeddings models take text as input and output a vector of floating point numbers to use for tasks like similarity comparisons and search. Our embedding service is OpenAI compatible. Refer to OpenAI’s embeddings guide and OpenAI’s embeddings documentation for more information on using these models.Python
dimensions parameter to the request, for example, dimensions: 128.
The API usage for embedding models is identical for BERT-based and LLM-based embeddings. Simply use the /v1/embeddings endpoint with your chosen model.
Reranking documents
Reranking models are used to rerank a list of documents based on a query. We only support reranking with the Qwen3 Reranker family of models:fireworks/qwen3-reranker-8b(*available on serverless)fireworks/qwen3-reranker-4bfireworks/qwen3-reranker-0p6b
Python
Deploying embeddings and reranking models
While Qwen3 Embedding 8b and Qwen3 Reranker 8b are available on serverless, you also have the option to deploy them via on-demand deployments.We recommend passing
--load-targets default=0.4 to ensure proper autoscaling responsiveness for these deployments