voyage-4-lite API & Playground

voyage-4-lite is a lightweight, general-purpose embedding model optimized for low latency and cost. Enabled by Matryoshka learning and quantization-aware training, voyage-4-lite supports embeddings in 2048, 1024, 512, and 256 dimensions, with multiple quantization options

Voyage 4 Lite API Features

On-demand Deployment

Docs

On-demand deployments allow you to use Voyage 4 Lite on dedicated GPUs with Fireworks' high-performance serving stack with high reliability and no rate limits.

FAQs

How do the Voyage 4 family models differ?

All Voyage 4 models share the same embedding space and support the same output dimensions: 2048, 1024, 512, and 256. They also support the same quantization options.

The main difference is model size. Larger models provide higher retrieval accuracy, while smaller models are optimized for lower latency and greater cost efficiency. Because the models share a compatible embedding space, teams can mix and match them across their retrieval pipeline to optimize for accuracy, latency, and cost.

How do Voyage AI's embedding models perform compared to other providers?

General-purpose retrieval. The bar chart below compares the average retrieval quality of the Voyage 4 series of models along with Gemini Embedding 001, Cohere Embed v4, and OpenAI v3 Large. Overall, voyage-4-large is the top-performing model, surpassing voyage-4, voyage-4-lite, Gemini Embedding 001, Cohere Embed v4, and OpenAI v3 Large by an average of 1.87%, 4.80%, 3.87%, 8.20%, and 14.05%, respectively.

What is asymmetric retrieval, and when should I use it?

Asymmetric retrieval means using different embedding models for queries and documents while keeping the embeddings compatible in the same vector space.

For example, you can embed documents with voyage-4-large to maximize index accuracy, then embed user queries with voyage-4-lite to reduce query-time latency and cost. Since Voyage 4 models share an embedding space, these embeddings can be used together in the same retrieval workflow.

This approach is useful when you want to preserve strong search accuracy while improving real-time latency or reducing operating costs.

What quantized output formats are supported?

Voyage 4 models support five output data types: float, int8, uint8, binary, and ubinary.

These lower-precision formats can significantly reduce storage and retrieval costs while preserving retrieval accuracy, thanks to quantization-aware training.

Metadata

State

Ready

Created on

6/15/2026

Kind

Embedding model

Provider

Voyage AI by MongoDB

Hugging Face

voyageai/voyage-4-lite

Specification

Calibrated

Mixture-of-Experts

Parameters

751M

Supported Functionality

Fine-tuning

Not supported

Serverless

Not supported

Context Length

40.9k tokens

Function Calling

Not supported

Embeddings

Supported

Rerankers

Not supported

Support image input

Not supported