Check your current limits
View your account’s current quotas and limits:Spending tiers
Your account tier determines the maximum budget you can set:| Tier | Criteria | Max Monthly Budget |
|---|---|---|
| Tier 1 | Valid payment method | $50 |
| Tier 2 | Spend or add $50 in credits | $500 |
| Tier 3 | Spend or add $500 in credits | $5,000 |
| Tier 4 | Spend or add $5,000 in credits | $50,000 |
| Unlimited | Contact us | Unlimited |
Add prepaid credits to unlock a higher tier. For example, adding $100 moves you from Tier 1 to Tier 2. Your new tier activates within minutes.
Manage your quotas
Budget control
Budget control
Control your monthly spending with flexible budget limits. Set a limit that fits your needs and adjust it anytime.Set a custom monthly budget:For example, to set a $200 monthly budget:
View and adjust your spend limit
Check your current spend limit:When you reach your budget
When you reach your spending limit, all API requests pause automatically across serverless inference, deployments, and fine-tuning. To resume, add credits to increase your tier and set a higher budget.On-demand deployment quotas
On-demand deployment quotas
On-demand deployments have GPU quotas instead of rate limits:
| GPU Type | Default Quota |
|---|---|
| Nvidia A100 | 8 GPUs |
| Nvidia H100 | 8 GPUs |
| Nvidia H200 | 8 GPUs |
| GPU hours/month | 2,000 |
| LoRAs (on-demand and serverless) | 100 |
Need more GPUs? Contact us to request a quota increase.
Serverless rate limits
Serverless rate limits
Default limits
All accounts with a payment method get these limits:| Limit | Value |
|---|---|
| Requests per minute (RPM) | 6,000 |
| Audio min per minute, Whisper-v3-large | 200 |
| Audio min per minute, Whisper-v3-turbo | 400 |
| Concurrent connections, streaming speech | 10 |
| LoRAs (on-demand and serverless) | 100 |
Make sure to add a payment method to access higher rate limits up to 6,000 RPM. Without a payment method, you’re limited to 10 RPM. Your rate limits will increase automatically once the payment method is added.
During periods of high load, RPM limit may be lower.
How rate limiting works
Dynamic rate limits support high RPM limits in a fair manner, while limiting spiky traffic from impacting other users:- Gradual scaling: Your minimum limits increase as you sustain consistent high usage
- Typical scale-up: Traffic can typically double within an hour without issues
- Burst handling: Short traffic spikes are accommodated during autoscaling
- Check response headers to see your current limits and remaining capacity
x-ratelimit-limit-requests: Your current minimum limitx-ratelimit-remaining-requests: Remaining capacityx-ratelimit-over-limit: yes: Your request was processed but you’re near capacity
For production workloads requiring consistent performance and higher limits, use on-demand deployments. They provide dedicated GPUs with no rate limits and SLA guarantees.
Account recovery
Account recovery
If your account is suspended due to failed payment:
- Go to Billing → Invoices
- Pay any outstanding invoices
- Your account reactivates automatically within an hour
Still suspended after resolving payment issues? Contact support via Discord or email [email protected].