Skip to main content

Check your current limits

View your account’s current quotas and limits:
firectl list quotas
This shows your rate limits, GPU quotas, spend limits, and usage across serverless and on-demand deployments.

Spending tiers

Your account tier determines the maximum budget you can set:
TierCriteriaMax Monthly Budget
Tier 1Valid payment method$50
Tier 2Spend or add $50 in credits$500
Tier 3Spend or add $500 in credits$5,000
Tier 4Spend or add $5,000 in credits$50,000
UnlimitedContact usUnlimited
Add prepaid credits to unlock a higher tier. For example, adding $100 moves you from Tier 1 to Tier 2. Your new tier activates within minutes.

Manage your quotas

Control your monthly spending with flexible budget limits. Set a limit that fits your needs and adjust it anytime.

View and adjust your spend limit

Check your current spend limit:
firectl list quotas
Set a custom monthly budget:
firectl update quota monthly-spend-usd --value <AMOUNT>
For example, to set a $200 monthly budget:
firectl update quota monthly-spend-usd --value 200

When you reach your budget

When you reach your spending limit, all API requests pause automatically across serverless inference, deployments, and fine-tuning. To resume, add credits to increase your tier and set a higher budget.
On-demand deployments have GPU quotas instead of rate limits:
GPU TypeDefault Quota
Nvidia A1008 GPUs
Nvidia H1008 GPUs
Nvidia H2008 GPUs
GPU hours/month2,000
LoRAs (on-demand and serverless)100
Need more GPUs? Contact us to request a quota increase.

Default limits

All accounts with a payment method get these limits:
LimitValue
Requests per minute (RPM)6,000
Audio min per minute, Whisper-v3-large200
Audio min per minute, Whisper-v3-turbo400
Concurrent connections, streaming speech10
LoRAs (on-demand and serverless)100
Make sure to add a payment method to access higher rate limits up to 6,000 RPM. Without a payment method, you’re limited to 10 RPM. Your rate limits will increase automatically once the payment method is added.
During periods of high load, RPM limit may be lower.

How rate limiting works

Dynamic rate limits support high RPM limits in a fair manner, while limiting spiky traffic from impacting other users:
  • Gradual scaling: Your minimum limits increase as you sustain consistent high usage
  • Typical scale-up: Traffic can typically double within an hour without issues
  • Burst handling: Short traffic spikes are accommodated during autoscaling
Monitoring your limits:
  • Check response headers to see your current limits and remaining capacity
  • x-ratelimit-limit-requests: Your current minimum limit
  • x-ratelimit-remaining-requests: Remaining capacity
  • x-ratelimit-over-limit: yes: Your request was processed but you’re near capacity
For production workloads requiring consistent performance and higher limits, use on-demand deployments. They provide dedicated GPUs with no rate limits and SLA guarantees.
If your account is suspended due to failed payment:
  1. Go to Billing → Invoices
  2. Pay any outstanding invoices
  3. Your account reactivates automatically within an hour
Still suspended after resolving payment issues? Contact support via Discord or email [email protected].