Create Batch Inference Job

curl --request POST \
  --url https://api.fireworks.ai/v1/accounts/{account_id}/batchInferenceJobs \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "displayName": "<string>",
  "model": "<string>",
  "inputDatasetId": "<string>",
  "outputDatasetId": "<string>",
  "inferenceParameters": {
    "maxTokens": 123,
    "temperature": 123,
    "topP": 123,
    "n": 123,
    "extraBody": "<string>",
    "topK": 123
  },
  "precision": "PRECISION_UNSPECIFIED",
  "continuedFromJobName": "<string>"
}'

{
  "name": "<string>",
  "displayName": "<string>",
  "createTime": "2023-11-07T05:31:56Z",
  "createdBy": "<string>",
  "state": "JOB_STATE_UNSPECIFIED",
  "status": {
    "code": "OK",
    "message": "<string>"
  },
  "model": "<string>",
  "inputDatasetId": "<string>",
  "outputDatasetId": "<string>",
  "inferenceParameters": {
    "maxTokens": 123,
    "temperature": 123,
    "topP": 123,
    "n": 123,
    "extraBody": "<string>",
    "topK": 123
  },
  "updateTime": "2023-11-07T05:31:56Z",
  "precision": "PRECISION_UNSPECIFIED",
  "jobProgress": {
    "percent": 123,
    "epoch": 123,
    "totalInputRequests": 123,
    "totalProcessedRequests": 123,
    "successfullyProcessedRequests": 123,
    "failedRequests": 123,
    "outputRows": 123,
    "inputTokens": 123,
    "outputTokens": 123,
    "cachedInputTokenCount": 123
  },
  "continuedFromJobName": "<string>"
}

POST

accounts

{account_id}

batchInferenceJobs

Create Batch Inference Job

curl --request POST \
  --url https://api.fireworks.ai/v1/accounts/{account_id}/batchInferenceJobs \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "displayName": "<string>",
  "model": "<string>",
  "inputDatasetId": "<string>",
  "outputDatasetId": "<string>",
  "inferenceParameters": {
    "maxTokens": 123,
    "temperature": 123,
    "topP": 123,
    "n": 123,
    "extraBody": "<string>",
    "topK": 123
  },
  "precision": "PRECISION_UNSPECIFIED",
  "continuedFromJobName": "<string>"
}'

{
  "name": "<string>",
  "displayName": "<string>",
  "createTime": "2023-11-07T05:31:56Z",
  "createdBy": "<string>",
  "state": "JOB_STATE_UNSPECIFIED",
  "status": {
    "code": "OK",
    "message": "<string>"
  },
  "model": "<string>",
  "inputDatasetId": "<string>",
  "outputDatasetId": "<string>",
  "inferenceParameters": {
    "maxTokens": 123,
    "temperature": 123,
    "topP": 123,
    "n": 123,
    "extraBody": "<string>",
    "topK": 123
  },
  "updateTime": "2023-11-07T05:31:56Z",
  "precision": "PRECISION_UNSPECIFIED",
  "jobProgress": {
    "percent": 123,
    "epoch": 123,
    "totalInputRequests": 123,
    "totalProcessedRequests": 123,
    "successfullyProcessedRequests": 123,
    "failedRequests": 123,
    "outputRows": 123,
    "inputTokens": 123,
    "outputTokens": 123,
    "cachedInputTokenCount": 123
  },
  "continuedFromJobName": "<string>"
}

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Path Parameters

account_id

string

required

The Account Id

Query Parameters

batchInferenceJobId

string

ID of the batch inference job.

Body

application/json

displayName

string

state

enum<string>

default:JOB_STATE_UNSPECIFIED

JobState represents the state an asynchronous job can be in.

Available options:

JOB_STATE_UNSPECIFIED,

JOB_STATE_CREATING,

JOB_STATE_RUNNING,

JOB_STATE_COMPLETED,

JOB_STATE_FAILED,

JOB_STATE_CANCELLED,

JOB_STATE_DELETING,

JOB_STATE_WRITING_RESULTS,

JOB_STATE_VALIDATING,

JOB_STATE_DELETING_CLEANING_UP,

JOB_STATE_PENDING,

JOB_STATE_EXPIRED,

JOB_STATE_RE_QUEUEING,

JOB_STATE_CREATING_INPUT_DATASET,

JOB_STATE_IDLE

status

object

Show child attributes

model

string

The name of the model to use for inference. This is required, except when continued_from_job_name is specified.

inputDatasetId

string

The name of the dataset used for inference. This is required, except when continued_from_job_name is specified.

outputDatasetId

string

The name of the dataset used for storing the results. This will also contain the error file.

inferenceParameters

object

Parameters controlling the inference process.

Show child attributes

precision

enum<string>

default:PRECISION_UNSPECIFIED

The precision with which the model should be served. If PRECISION_UNSPECIFIED, a default will be chosen based on the model.

Available options:

PRECISION_UNSPECIFIED,

FP16,

FP8,

FP8_MM,

FP8_AR,

FP8_MM_KV_ATTN,

FP8_KV,

FP8_MM_V2,

FP8_V2,

FP8_MM_KV_ATTN_V2,

NF4,

FP4,

BF16,

FP4_BLOCKSCALED_MM,

FP4_MX_MOE

jobProgress

object

Job progress.

Show child attributes

continuedFromJobName

string

The resource name of the batch inference job that this job continues from. Used for lineage tracking to understand job continuation chains.

Response

200 - application/json

A successful response.

name

string

displayName

string

createTime

string<date-time>

The creation time of the batch inference job.

createdBy

string

The email address of the user who initiated this batch inference job.

state

enum<string>

default:JOB_STATE_UNSPECIFIED

JobState represents the state an asynchronous job can be in.

Available options:

JOB_STATE_UNSPECIFIED,

JOB_STATE_CREATING,

JOB_STATE_RUNNING,

JOB_STATE_COMPLETED,

JOB_STATE_FAILED,

JOB_STATE_CANCELLED,

JOB_STATE_DELETING,

JOB_STATE_WRITING_RESULTS,

JOB_STATE_VALIDATING,

JOB_STATE_DELETING_CLEANING_UP,

JOB_STATE_PENDING,

JOB_STATE_EXPIRED,

JOB_STATE_RE_QUEUEING,

JOB_STATE_CREATING_INPUT_DATASET,

JOB_STATE_IDLE

status

object

Show child attributes

model

string

The name of the model to use for inference. This is required, except when continued_from_job_name is specified.

inputDatasetId

string

The name of the dataset used for inference. This is required, except when continued_from_job_name is specified.

outputDatasetId

string

The name of the dataset used for storing the results. This will also contain the error file.

inferenceParameters

object

Parameters controlling the inference process.

Show child attributes

updateTime

string<date-time>

The update time for the batch inference job.

precision

enum<string>

default:PRECISION_UNSPECIFIED

The precision with which the model should be served. If PRECISION_UNSPECIFIED, a default will be chosen based on the model.

Available options:

PRECISION_UNSPECIFIED,

FP16,

FP8,

FP8_MM,

FP8_AR,

FP8_MM_KV_ATTN,

FP8_KV,

FP8_MM_V2,

FP8_V2,

FP8_MM_KV_ATTN_V2,

NF4,

FP4,

BF16,

FP4_BLOCKSCALED_MM,

FP4_MX_MOE

jobProgress

object

Job progress.

Show child attributes

continuedFromJobName

string

The resource name of the batch inference job that this job continues from. Used for lineage tracking to understand job continuation chains.

Get Batch Inference Job

Delete Batch Inference Job

⌘I

API Reference

Build SDK

Create Batch Inference Job

Authorizations

Path Parameters

Query Parameters

Body

Response