Authorizations
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Path Parameters
The Account Id
Query Parameters
By default, a deployment created with a currently undeployed base model will be deployed to this deployment. If true, this auto-deploy function is disabled.
By default, a deployment will use the speculative decoding settings from the base model. If true, this will disable speculative decoding.
The ID of the deployment. If not specified, a random ID will be generated.
If true, this will not create the deployment, but will return the deployment that would be created.
Body
The properties of the deployment being created.
Human-readable display name of the deployment. e.g. "My Deployment" Must be fewer than 64 characters long.
Description of the deployment.
The time at which this deployment will automatically be deleted.
The state of the deployment.
STATE_UNSPECIFIED
, CREATING
, READY
, DELETING
, FAILED
, UPDATING
, DELETED
Detailed status information regarding the most recent operation.
The minimum number of replicas. If not specified, the default is 0.
The maximum number of replicas. If not specified, the default is max(min_replica_count, 1). May be set to 0 to downscale the deployment to 0.
The number of accelerators used per replica. If not specified, the default is the estimated minimum required by the base model.
The type of accelerator to use. If not specified, the default is NVIDIA_A100_80GB.
ACCELERATOR_TYPE_UNSPECIFIED
, NVIDIA_A100_80GB
, NVIDIA_H100_80GB
, AMD_MI300X_192GB
, NVIDIA_A10G_24GB
, NVIDIA_A100_40GB
, NVIDIA_L4_24GB
, NVIDIA_H200_141GB
, NVIDIA_B200_180GB
The precision with which the model should be served.
PRECISION_UNSPECIFIED
, FP16
, FP8
, FP8_MM
, FP8_AR
, FP8_MM_KV_ATTN
, FP8_KV
, FP8_MM_V2
, FP8_V2
, FP8_MM_KV_ATTN_V2
, NF4
, FP4
, BF16
, FP4_BLOCKSCALED_MM
, FP4_MX_MOE
If true, PEFT addons are enabled for this deployment.
The number of candidate tokens to generate per step for speculative decoding. Default is the base model's draft_token_count. Set CreateDeploymentRequest.disable_speculative_decoding to false to disable this behavior.
The draft model name for speculative decoding. e.g. accounts/fireworks/models/my-draft-model If empty, speculative decoding using a draft model is disabled. Default is the base model's default_draft_model. Set CreateDeploymentRequest.disable_speculative_decoding to false to disable this behavior.
The length of previous input sequence to be considered for N-gram speculation.
The name of the deployment template to use for this deployment. Only available to enterprise accounts.
The performance profile to use for this deployment.
The desired geographic region where the deployment must be placed. If unspecified, the default is the GLOBAL multi-region.
The geographic region where the deployment is presently located. This region may change
over time, but within the placement
constraint.
REGION_UNSPECIFIED
, US_IOWA_1
, US_VIRGINIA_1
, US_VIRGINIA_2
, US_ILLINOIS_1
, AP_TOKYO_1
, EU_LONDON_1
, US_ARIZONA_1
, US_TEXAS_1
, US_ILLINOIS_2
, EU_FRANKFURT_1
, US_TEXAS_2
, EU_PARIS_1
, EU_HELSINKI_1
, US_NEVADA_1
, EU_ICELAND_1
, EU_ICELAND_2
, US_WASHINGTON_1
, US_WASHINGTON_2
, US_WASHINGTON_3
, AP_TOKYO_2
, US_CALIFORNIA_1
, US_UTAH_1
Whether the deployment size validation is disabled.
If true, MTP is enabled for this deployment.
Response
A successful response.
Human-readable display name of the deployment. e.g. "My Deployment" Must be fewer than 64 characters long.
Description of the deployment.
The creation time of the deployment.
The time at which this deployment will automatically be deleted.
The time at which the resource will be hard deleted.
The time at which the resource will be soft deleted.
The state of the deployment.
STATE_UNSPECIFIED
, CREATING
, READY
, DELETING
, FAILED
, UPDATING
, DELETED
Detailed status information regarding the most recent operation.
The minimum number of replicas. If not specified, the default is 0.
The maximum number of replicas. If not specified, the default is max(min_replica_count, 1). May be set to 0 to downscale the deployment to 0.
The number of accelerators used per replica. If not specified, the default is the estimated minimum required by the base model.
The type of accelerator to use. If not specified, the default is NVIDIA_A100_80GB.
ACCELERATOR_TYPE_UNSPECIFIED
, NVIDIA_A100_80GB
, NVIDIA_H100_80GB
, AMD_MI300X_192GB
, NVIDIA_A10G_24GB
, NVIDIA_A100_40GB
, NVIDIA_L4_24GB
, NVIDIA_H200_141GB
, NVIDIA_B200_180GB
The precision with which the model should be served.
PRECISION_UNSPECIFIED
, FP16
, FP8
, FP8_MM
, FP8_AR
, FP8_MM_KV_ATTN
, FP8_KV
, FP8_MM_V2
, FP8_V2
, FP8_MM_KV_ATTN_V2
, NF4
, FP4
, BF16
, FP4_BLOCKSCALED_MM
, FP4_MX_MOE
If set, this deployment is deployed to a cloud-premise cluster.
If true, PEFT addons are enabled for this deployment.
The number of candidate tokens to generate per step for speculative decoding. Default is the base model's draft_token_count. Set CreateDeploymentRequest.disable_speculative_decoding to false to disable this behavior.
The draft model name for speculative decoding. e.g. accounts/fireworks/models/my-draft-model If empty, speculative decoding using a draft model is disabled. Default is the base model's default_draft_model. Set CreateDeploymentRequest.disable_speculative_decoding to false to disable this behavior.
The length of previous input sequence to be considered for N-gram speculation.
The name of the deployment template to use for this deployment. Only available to enterprise accounts.
The performance profile to use for this deployment.
The desired geographic region where the deployment must be placed. If unspecified, the default is the GLOBAL multi-region.
The geographic region where the deployment is presently located. This region may change
over time, but within the placement
constraint.
REGION_UNSPECIFIED
, US_IOWA_1
, US_VIRGINIA_1
, US_VIRGINIA_2
, US_ILLINOIS_1
, AP_TOKYO_1
, EU_LONDON_1
, US_ARIZONA_1
, US_TEXAS_1
, US_ILLINOIS_2
, EU_FRANKFURT_1
, US_TEXAS_2
, EU_PARIS_1
, EU_HELSINKI_1
, US_NEVADA_1
, EU_ICELAND_1
, EU_ICELAND_2
, US_WASHINGTON_1
, US_WASHINGTON_2
, US_WASHINGTON_3
, AP_TOKYO_2
, US_CALIFORNIA_1
, US_UTAH_1
The update time for the deployment.
Whether the deployment size validation is disabled.
If true, MTP is enabled for this deployment.