Overview

Fireworks provides a metrics endpoint in Prometheus format, enabling integration with popular observability tools like Prometheus, OpenTelemetry (OTel) Collector, Datadog Agent, and Vector.

Setting Up Metrics Collection

Endpoint

The metrics endpoint is as follows. This URL and authorization header can be directly used by services like Grafana Cloud to ingest Fireworks metrics.
https://api.fireworks.ai/v1/accounts/<account-id>/metrics

Authentication

Use the Authorization header with your Fireworks API key:
{
  "Authorization": "Bearer YOUR_API_KEY"
}

Scrape Interval

We recommend using a 1-minute scrape interval as metrics are updated every 30s.

Supported Integrations

Fireworks metrics can be collected via OpenTelemetry Collector and exported to various observability platforms including:
  • Prometheus
  • Datadog
  • Grafana
  • New Relic

Prometheus Integration

To integrate with Prometheus, specify the Fireworks metrics endpoint in your scrape config. You will then need to configure a Prometheus receiver to scrape the endpoint and push it to services like Datadog.
global:
  scrape_interval: 60s
scrape_configs:
  - job_name: 'fireworks'
    metrics_path: 'v1/accounts/<account-id>/metrics'
    authorization:
      type: "Bearer"
      credentials: "YOUR_API_KEY"
    static_configs:
      - targets: ['api.fireworks.ai']
    scheme: https
For more details on Prometheus configuration, refer to the Prometheus documentation.

Ratelimits

To ensure service stability and fair usage:
  • Maximum of 6 requests per minute per account
  • Exceeding this limit results in HTTP 429 (Too Many Requests) responses
  • Use a 1-minute scrape interval to stay within limits

Available Metrics

Common Labels

All metrics include the following common labels:
  • base_model: The base model identifier (e.g., “accounts/fireworks/models/deepseek-v3”)
  • deployment: Full deployment path (e.g., “accounts/account-name/deployments/deployment-id”)
  • deployment_account: The account name
  • deployment_id: The deployment identifier

Rate Metrics (per second)

These metrics show activity rates calculated using 1-minute windows:

Request Rate

  • request_counter_total:sum_by_deployment: Request rate per deployment

Token Processing Rates

  • tokens_cached_prompt_total:sum_by_deployment: Rate of cached prompt tokens per deployment
  • tokens_prompt_total:sum_by_deployment: Rate of total prompt tokens processed per deployment

Latency Histogram Metrics

These metrics provide latency distribution data with histogram buckets, calculated using 1-minute windows:

Generation Latency

  • latency_generation_per_token_ms_bucket:sum_by_deployment: Per-token generation time distribution
  • latency_generation_queue_ms_bucket:sum_by_deployment: Time spent waiting in generation queue

Request Latency

  • latency_overall_ms_bucket:sum_by_deployment: End-to-end request latency distribution
  • latency_to_first_token_ms_bucket:sum_by_deployment: Time to first token distribution

Prefill Latency

  • latency_prefill_ms_bucket:sum_by_deployment: Prefill processing time distribution
  • latency_prefill_queue_ms_bucket:sum_by_deployment: Time spent waiting in prefill queue

Token Distribution Metrics

These histogram metrics show token count distributions per request, calculated using 1-minute windows:
  • tokens_generated_per_request_bucket:sum_by_deployment: Distribution of generated tokens per request
  • tokens_prompt_per_request_bucket:sum_by_deployment: Distribution of prompt tokens per request

Resource Utilization Metrics

These gauge metrics show average resource usage:
  • generator_kv_blocks_fraction:avg_by_deployment: Average fraction of KV cache blocks in use
  • generator_kv_slots_fraction:avg_by_deployment: Average fraction of KV cache slots in use
  • generator_model_forward_time:avg_by_deployment: Average time spent in model forward pass
  • requests_coordinator_concurrent_count:avg_by_deployment: Average number of concurrent requests
  • prefiller_prompt_cache_ttl:avg_by_deployment: Average prompt cache time-to-live