Fireworks RFT now available! Fine-tune open models that outperform frontier models. Try today

Test1234
FIREWORKS AI CLOUD

Build. Tune. Scale.

Open-source AI models at blazing speed, optimized for your use case, scaled globally with the Fireworks Inference Cloud

FPO
Test1234
FIREWORKS AI CLOUD

Build. Tune. Scale.

Open-source AI models at blazing speed, optimized for your use case, scaled globally with the Fireworks Inference Cloud

FPO
FIREWORKS AI CLOUD

Build. Tune. Scale.

Open-source AI models at blazing speed, optimized for your use case, scaled globally with the Fireworks Inference Cloud

FPO
Careers

Working Side-by-Side

Collaboration happens naturally here. Big ideas grow best when we solve problems together.

Careers 2

Real Connections, Every Day

From daily standups to after-hours hangs, genuine teamwork drives everything we do.

Careers 3

Team Offsites That Energize

We bring people together beyond the screen — to connect, recharge, and build stronger teams.

Careers 4

Inside Our Creative Space

Our offices are designed for focus, flow, and spontaneous moments of inspiration.

Open roles

A career at FireworksAI offers the opportunity to work closely with some of the best minds within the scientific community and beyond. We’re looking for people from all backgrounds who want to make a real, positive impact on the world.

Models grid
EYEBROW

Blazing fast inference for hundreds of models

Instantly run popular and specialized models, including Llama3, Mixtral, and Stable Diffusion, optimized for peak latency, throughput, and context length. FireAttention, our custom CUDA kernel, serves models four times faster than vLLM without compromising quality.

EYEBROW

What’s new in our blog

Instantly run popular and specialized models, including Llama3, Mixtral, and Stable Diffusion, optimized for peak latency, throughput, and context length.

Test

Test

  • test
  • test

Upwork Achieves Up to 2X Faster Proposal Assistance

2X

Faster Proposal Assistance

0.75s

Sub-second first-token latency

0.75s

Sub-second first-token latency
Upwork Logo