Serverless 2.0 is live: control reliability & speed without reserved capacity. Get Started.

Fireworks AI × Microsoft Foundry

Frontier open and custom models, inside Azure

Fireworks AI is now a first-party inference provider in Microsoft Foundry.
Billed through your existing Azure account.

TRUSTED BY LEADING AI TEAMS

stackblitz

Fireworks AI on Microsoft Foundry gives us the inference throughput and latency we need to power Bolt at production scale and all within the Azure ecosystem.

Dominick Elm
Dominick Elm | Founding Engineer at StackBlitz (Bolt)
Motif

“Using Fireworks AI on Foundry, we can run repeatable, high-volume evaluations through a single Azure endpoint, which helps our team move faster from deployment to informed model decisions with more confidence.”

Hanbin Jung | Partnership Lead at Motif
Ui Path

By running Fireworks AI on Azure Foundry, UiPath powers both Autopilot and Delegate with open models that are significantly faster and more cost-efficient for Computer Use, all while matching the quality of Claude's Sonnet 4.6. It's a step-change in how we deliver AI at scale to our customers.

Neagovici-Negoescu
Mircea Neagovici-Negoescu | SVP, Head of AI at UiPath
stackblitz

Fireworks AI on Microsoft Foundry gives us the inference throughput and latency we need to power Bolt at production scale and all within the Azure ecosystem.

Dominick Elm
Dominick Elm | Founding Engineer at StackBlitz (Bolt)
Motif

“Using Fireworks AI on Foundry, we can run repeatable, high-volume evaluations through a single Azure endpoint, which helps our team move faster from deployment to informed model decisions with more confidence.”

Hanbin Jung | Partnership Lead at Motif
Ui Path

By running Fireworks AI on Azure Foundry, UiPath powers both Autopilot and Delegate with open models that are significantly faster and more cost-efficient for Computer Use, all while matching the quality of Claude's Sonnet 4.6. It's a step-change in how we deliver AI at scale to our customers.

Neagovici-Negoescu
Mircea Neagovici-Negoescu | SVP, Head of AI at UiPath
stackblitz

Fireworks AI on Microsoft Foundry gives us the inference throughput and latency we need to power Bolt at production scale and all within the Azure ecosystem.

Dominick Elm
Dominick Elm | Founding Engineer at StackBlitz (Bolt)

Launch Announcement: Microsoft Build - 06.02.26

Microsoft CVP Yina Arenas recently sat down with Fireworks CEO Lin Qiao to discuss Fireworks on Foundry.

Microsoft Azure
Why Fireworks on Foundry

Frontier open models, inside Foundry

Azure-native procurement, compliance, security, and dedicated capacity, now combined with Fireworks inference. All inside the environment you already operate.

Native

First-party in Microsoft Foundry. Identity through Entra ID, billing through Azure, governance through your tenant.

Open

20+ frontier open models: DeepSeek, Kimi, GLM, gpt-oss, MiniMax. All accessible through one API.

Enterprise-Ready

PTU-backed dedicated capacity, OpenAI-compatible API, and the Fireworks inference engine. Production-grade out of the box.

01

Azure billing & security

Usage runs through your existing Azure account and counts toward MACC. PTU is also ACD-eligible. Security policies, IAM, and audit inherited from your tenant.

02

Industry-leading inference speed

The Fireworks inference engine and FireOptimizer compilation are tuned per model architecture for high throughput and low latency in production.

03

Deploy at any scale

Up to 1M TPM on serverless PayGo, with dedicated PTU capacity for production workloads. All inside your Azure subscription.

04

Custom models inside Azure

Import and deploy your own model weights on Foundry using the Fireworks inference runtime.

Two deployment options

PayGo serverless or PTU dedicated capacity.

Multi-tenant serverless for prototyping and bursty workloads, or single-tenant dedicated capacity for production. Both billed through Azure.

PayGo

Serverless · multi-tenant

Per-token pricing with no commitment. The fastest way to get started. Perfect for startups, prototyping, and bursty workloads.

PRICING: Per token · MACC-eligible

REGIONS: 6 US Data Zone regions

THROUGHPUT: Up to 1M TPM

SETUP:
Self-serve · Azure portal
MOST POPULAR

PTU

Single-tenant · dedicated capacity

Dedicated GPU capacity reserved for your workload. Consistent throughput, stable endpoints, broader region availability. Best for production applications with predictable traffic.

PRICING: Per PTU-hr · ACD + MACC

REGIONS: Global (subject to capacity)

CAPACITY: Dedicated · min 2 replicas

SETUP:
Self-serve · Azure portal
The model catalog

20+ frontier open models now available in Foundry

DeepSeek, Kimi, GLM, MiniMax, gpt-oss, etc. are each served via the OpenAI-compatible Chat Completions and Responses APIs. New models added on a rolling basis.

Microsoft Foundry
Inference engine

High-performance inference, built for open models

Fireworks AI provides a production-grade inference engine built specifically for modern open-weight models, now available natively within Microsoft Foundry.

  • High-throughput inference with low-latency responses
  • Cost-efficient token generation at scale
  • OpenAI-compatible API, no SDK migration required
  • FireOptimizer compilation tuned per model architecture
  • Production-grade performance without managing infrastructure
Custom Models

Import and deploy your own model weights on Foundry

Run your proprietary or fine-tuned open-weight models within the Foundry ecosystem using the Fireworks inference runtime.

Trust

Built on Azure billing and security

Azure billing & identity

Usage is billed through your Azure account and is MACC-eligible. PTU is also ACD-eligible. Identity via Entra ID, governance from your tenant.

First-party Microsoft subprocessor

Fireworks runs as a first-party provider inside Microsoft Foundry under Microsoft's terms, not a third-party vendor your procurement needs to onboard.

FAQ

Common questions

For full setup steps and operational detail, see the Microsoft Learn enable guide.

How do I enable Fireworks in my Azure subscription?

First, make sure you're on the New Foundry experience (ai.azure.com/nextgen). Then in the Azure Portal go to Subscriptions → Preview features → search Fireworks.Enable.Deploy → Register. Propagates in up to 30 minutes. Full setup guide on Microsoft Learn →

How do I request a higher TPM quota?

Default PayGo capacity is up to 1M TPM per request. Higher quotas are available. Request at aka.ms/fireworks-quota. Quota is granted across all Fireworks models on your subscription, not per-model.

Is Fireworks on Foundry covered by EU Data Boundary?

Not today. This is a current capacity limitation. PayGo is available in US Data Zone regions; PTU is global subject to capacity. EU coverage is on the roadmap. Reach out to [email protected] if EU residency is a hard requirement and we'll work through it with you.

Can I use Azure credits with Fireworks?

Azure credits can be applied to PTU deployments. PayGo usage is billed at standard token pricing and counts toward MACC.

Which APIs are supported?

OpenAI-compatible Chat Completions and Responses APIs. If your code already uses the OpenAI SDK, you only need to change the base URL and key.

What regions are available?

PayGo: 6 US Data Zone regions (East US, East US 2, Central US, North Central US, West US, West US 3). PTU is available globally subject to capacity. FedRAMP and PCI are not currently certified.

How quickly do new models reach Foundry?

New models go through a Microsoft onboarding pipeline, typically 5–10 days after we release them on the Fireworks platform. For day-of-release access, use fireworks.ai directly.

Get started

Start deploying frontier open models on Azure.

Access Fireworks AI directly from Microsoft Foundry. Usage is billed through your Azure account and is MACC-eligible.


PTU provisioning or BYOM questions? Contact [email protected]