Build

Build with the fastest and most efficient platform to serve open models

The most reliable AI cloud for enterprises — secure, compliant, and built to scale

Talk with our team

Fast Experimentation

Go from prototype to production in minutes

Get started with SDKs, iterate on models, and scale reliably with an inference engine tuned for throughput

Start Building

Experiment in seconds with our SDK

Access the latest models via Python, JS, or REST. A single API call gets you from prompt to output—no infra setup.

Iterate models and build agents faster

Run hundreds of LoRA variants in parallel for rapid tuning. Build agents with built-in memory, tool use, and multi-modal pipelines.

Run inference at peak efficiency

A disaggregated stack with quantization, KV caching, and efficient GPU memory use delivers low latency and high throughput for long, multi-turn sessions and multimedia workloads.

Model library

Run the latest open models with a single line of code

Fireworks gives you instant access to the most popular OSS models — optimized for cost, speed, and quality on the fastest AI cloud

Learn more

View all models

Build Features

Multiple modalities, same optimized performance

From LLMs to multimedia, build up to 15× faster than closed providers at a predictable cost optimized for performance and scale

Disaggregated inference engine

Optimized at every layer—from quantization and caching to GPU memory layout so you get peak performance

Audio transcription

Get 4× cheaper, 10× faster Whisper transcription with translation, with full audio insights out of the box

Image and documents

Extract and classify images or docs quickly, then feed structured outputs into LLMs

Build with the fastest and most efficient platform to serve open models

Go from prototype to production in minutes

Experiment in seconds with our SDK

Iterate models and build agents faster

Run inference at peak efficiency

Run the latest open models with a single line of code

Gemma 3 27B Instruct

Llama 3 8B Instruct (HF version)

Whisper V3 Large

Nous Hermes Llama2 13B

OpenAI gpt-oss-20b

GLM-4.5

Deepseek R1 05/28

Llama 3 8B Instruct (HF version)

Chronos Hermes 13B v2

Gemma 3 27B Instruct

Llama 3 8B Instruct (HF version)

Whisper V3 Large

Nous Hermes Llama2 13B

Multiple modalities, same optimized performance

Disaggregated inference engine

Audio transcription

Image and documents

Start building today