Rate Limits & Quotas

Check your current limits

View your account’s current quotas and limits:

firectl list quotas

This shows your rate limits, GPU quotas, spend limits, and usage across serverless and on-demand deployments.

Spending tiers

Your account tier determines the maximum budget you can set:

Tier	Criteria	Max Monthly Budget
Tier 1	Valid payment method	$50
Tier 2	Spend or add $50 in credits	$500
Tier 3	Spend or add $500 in credits	$5,000
Tier 4	Spend or add $5,000 in credits	$50,000
Unlimited	Contact us	Unlimited

Add prepaid credits to unlock a higher tier. For example, adding $100 moves you from Tier 1 to Tier 2. Your new tier activates within minutes.

Manage your quotas

Budget control

Control your monthly spending with flexible budget limits. Set a limit that fits your needs and adjust it anytime.

View and adjust your spend limit

Check your current spend limit:

firectl list quotas

Set a custom monthly budget:

firectl update quota monthly-spend-usd --value <AMOUNT>

For example, to set a $200 monthly budget:

firectl update quota monthly-spend-usd --value 200

When you reach your budget

When you reach your spending limit, all API requests pause automatically across serverless inference, deployments, and fine-tuning. To resume, add credits to increase your tier and set a higher budget.

On-demand deployment quotas

On-demand deployments have GPU quotas instead of rate limits:

GPU Type	Default Quota
Nvidia A100	8 GPUs
Nvidia H100	8 GPUs
Nvidia H200	8 GPUs
GPU hours/month	2,000
LoRAs (on-demand and serverless)	100

Need more GPUs? Contact us to request a quota increase.

Serverless rate limits

Default limits

All accounts with a payment method get these limits:

Limit	Value
Requests per minute (RPM)	6,000
Audio min per minute, Whisper-v3-large	200
Audio min per minute, Whisper-v3-turbo	400
Concurrent connections, streaming speech	10
LoRAs (on-demand and serverless)	100

Make sure to add a payment method to access higher rate limits up to 6,000 RPM. Without a payment method, you’re limited to 10 RPM. Your rate limits will increase automatically once the payment method is added.

During periods of high load, RPM limit may be lower.

How rate limiting works

Dynamic rate limits support high RPM limits in a fair manner, while limiting spiky traffic from impacting other users:

Gradual scaling: Your minimum limits increase as you sustain consistent high usage
Typical scale-up: Traffic can typically double within an hour without issues
Burst handling: Short traffic spikes are accommodated during autoscaling

Monitoring your limits:

Check response headers to see your current limits and remaining capacity
x-ratelimit-limit-requests: Your current minimum limit
x-ratelimit-remaining-requests: Remaining capacity
x-ratelimit-over-limit: yes: Your request was processed but you’re near capacity

For production workloads requiring consistent performance and higher limits, use on-demand deployments. They provide dedicated GPUs with no rate limits and SLA guarantees.

Account recovery

If your account is suspended due to failed payment:

Go to Billing → Invoices
Pay any outstanding invoices
Your account reactivates automatically within an hour

Still suspended after resolving payment issues? Contact support via Discord or email [email protected].

Get Started

Deployments

Models & Inference

Fine Tuning

Administration

Security & Compliance

Integrations

Check your current limits

Spending tiers

Manage your quotas

View and adjust your spend limit

When you reach your budget

Default limits

How rate limiting works

Get Started

Deployments

Models & Inference

Fine Tuning

Administration

Security & Compliance

Integrations

​Check your current limits

​Spending tiers

​Manage your quotas

​View and adjust your spend limit

​When you reach your budget

​Default limits

​How rate limiting works

Check your current limits

Spending tiers

Manage your quotas

View and adjust your spend limit

When you reach your budget

Default limits

How rate limiting works