Fireworks hosts many embedding models.

Purpose-built embedding models

These models are optimized specifically for tasks like semantic search and document similarity comparison.
Model namemodel size
nomic-ai/nomic-embed-text-v1.5 (recommended)137M
nomic-ai/nomic-embed-text-v1137M
WhereIsAI/UAE-Large-V1335M
thenlper/gte-large335M
thenlper/gte-base109M
BAAI/bge-base-en-v1.5109M
BAAI/bge-small-en-v1.533M
mixedbread-ai/mxbai-embed-large-v1335M
sentence-transformers/all-MiniLM-L6-v223M
sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2118M

LLM-based embedding models

Fireworks also supports retrieving embeddings from LLM-based models. These powerful inference models also produce embeddings useful for the same tasks as purpose-build embedding models. Generally, any LLM architecture that is compatible with Fireworks can be used for embeddings through the /v1/embeddings endpoint. This includes all the architectures supported for uploading custom models, such as Llama, Qwen, DeepSeek, Mistral, Mixtral, and many others. Here are some examples of LLM-based embedding models that work with the embeddings API:
Model name
fireworks/llama4-scout-instruct-basic
fireworks/glm-4p5
fireworks/gpt-oss-20b
fireworks/kimi-k2-instruct
fireworks/qwen3-30b-a3b
fireworks/deepseek-r1

Embedding documents

The embedding model inputs text and outputs a vector (list) of floating point numbers to use for tasks like similarity comparisons and search. Our embedding service is OpenAI compatible. Refer to OpenAI’s embeddings guide and OpenAI’s embeddings documentation for more information on using these models.
Python (OpenAI 1.x)
import openai

client = openai.OpenAI(
    base_url = "https://api.fireworks.ai/inference/v1",
    api_key="<FIREWORKS_API_KEY>",
)
response = client.embeddings.create(
  model="nomic-ai/nomic-embed-text-v1.5",
  input="search_document: Spiderman was a particularly entertaining movie with...",
)

print(response)
This code embeds the text search_document: Spiderman was a particularly entertaining movie with... and returns the following
Response
CreateEmbeddingResponse(data=[Embedding(embedding=[0.006380197126418352, 0.011841800063848495,...], index=0, object='embedding')], model='intfloat/e5-mistral-7b-instruct', object='list', usage=Usage(prompt_tokens=12, total_tokens=12))
The API usage for LLM-based embedding models is identical to purpose-built embedding models - simply use the /v1/embeddings endpoint with your chosen model.

Embedding queries and document

In the previous example, you might have noticed the search_document: prefix. Nomic models have been fine-tuned to take prefixes, so for user queries, you will need add thesearch_query: prefix, and for documents, you need to prefix with search_document: Here’s a quick example:
  • Let’s say we previously used the embedding model to embed many movie reviews that we stored in a vector database. All the documents should have been prefixed withsearch_document:
  • We now want to create a movie recommendation that takes in a user query and outputs recommendations based on this data. The code below demonstrates how to embed the user query and system prompt.
Python (OpenAI 1.x)
import openai

client = openai.OpenAI(
    base_url = "https://api.fireworks.ai/inference/v1",
    api_key="<FIREWORKS_API_KEY>",
)

query = "I love superhero movies, any recommendations?"
task_description="Given a user query for movies, retrieve the relevant movie that can fulfill the query. "
query_emb = client.embeddings.create(
  model="nomic-ai/nomic-embed-text-v1.5",
  input=f"search_query: {query}"
)
print(query_emb)
To view this example end-to-end and see how to use a MongoDB vector store and Fireworks-hosted generation model for RAG, see our full guide. For more information on what kind of prefixes are possible with nomic, please check out this guide from nomic.

Variable dimensions

The model also supports variable embedding dimension sizes. In this case, we can provide dimension as a query to the embeddings.create() request
Python (OpenAI 1.x)
import openai
client = openai.OpenAI(
  base_url="https://api.fireworks.ai/inference/v1",
  api_key="<FIREWORKS_API_KEY>",
)

response = client.embeddings.create(
  model="nomic-ai/nomic-embed-text-v1.5",
  input="search_document: I like Christmas movies, can you make any recommendations?",
  dimensions=128,
)
print(len(response.data[0].embedding))
You will see that the returned results are embeddings with dimension 128.