TRUSTED BY LEADING AI TEAMS
Fireworks AI on Microsoft Foundry gives us the inference throughput and latency we need to power Bolt at production scale and all within the Azure ecosystem.


“Using Fireworks AI on Foundry, we can run repeatable, high-volume evaluations through a single Azure endpoint, which helps our team move faster from deployment to informed model decisions with more confidence.”

By running Fireworks AI on Azure Foundry, UiPath powers both Autopilot and Delegate with open models that are significantly faster and more cost-efficient for Computer Use, all while matching the quality of Claude's Sonnet 4.6. It's a step-change in how we deliver AI at scale to our customers.

Fireworks AI on Microsoft Foundry gives us the inference throughput and latency we need to power Bolt at production scale and all within the Azure ecosystem.


“Using Fireworks AI on Foundry, we can run repeatable, high-volume evaluations through a single Azure endpoint, which helps our team move faster from deployment to informed model decisions with more confidence.”

By running Fireworks AI on Azure Foundry, UiPath powers both Autopilot and Delegate with open models that are significantly faster and more cost-efficient for Computer Use, all while matching the quality of Claude's Sonnet 4.6. It's a step-change in how we deliver AI at scale to our customers.

Fireworks AI on Microsoft Foundry gives us the inference throughput and latency we need to power Bolt at production scale and all within the Azure ecosystem.



Fireworks AI provides a production-grade inference engine built specifically for modern open-weight models, now available natively within Microsoft Foundry.
First, make sure you're on the New Foundry experience (ai.azure.com/nextgen). Then in the Azure Portal go to Subscriptions → Preview features → search Fireworks.Enable.Deploy → Register. Propagates in up to 30 minutes. Full setup guide on Microsoft Learn →
Default PayGo capacity is up to 1M TPM per request. Higher quotas are available. Request at aka.ms/fireworks-quota. Quota is granted across all Fireworks models on your subscription, not per-model.
Not today. This is a current capacity limitation. PayGo is available in US Data Zone regions; PTU is global subject to capacity. EU coverage is on the roadmap. Reach out to [email protected] if EU residency is a hard requirement and we'll work through it with you.
Azure credits can be applied to PTU deployments. PayGo usage is billed at standard token pricing and counts toward MACC.
OpenAI-compatible Chat Completions and Responses APIs. If your code already uses the OpenAI SDK, you only need to change the base URL and key.
PayGo: 6 US Data Zone regions (East US, East US 2, Central US, North Central US, West US, West US 3). PTU is available globally subject to capacity. FedRAMP and PCI are not currently certified.
New models go through a Microsoft onboarding pipeline, typically 5–10 days after we release them on the Fireworks platform. For day-of-release access, use fireworks.ai directly.