External GCS Bucket Integration

Use external Google Cloud Storage (GCS) buckets for fine-tuning while keeping your data private. Fireworks creates proxy datasets that reference your external buckets—data is only accessed during fine-tuning within a secure, isolated cluster.
Your data never leaves your GCS bucket except during fine-tuning, ensuring maximum privacy and security.

Required Permissions

You need to grant access to three service accounts:

Fireworks Control Plane

  • Account: [email protected]
  • Required role: Custom role with storage.buckets.getIamPolicy permission
gcloud storage buckets add-iam-policy-binding <YOUR_BUCKET> \
  --member=serviceAccount:[email protected] \
  --role=projects/<YOUR_PROJECT>/roles/<YOUR_CUSTOM_ROLE>
This service account will be used to retrieve the IAM Policy set on the bucket, so that we are able to perform bucket ownership verifications and access verifications during dataset creation.

Inference Service Account

  • Account: [email protected]
  • Required role: Storage Object Viewer or Storage Object Admin
gcloud storage buckets add-iam-policy-binding <YOUR_BUCKET> \
  --member=serviceAccount:[email protected] \
  --role=roles/storage.objectViewer
This service account will be used to access the files in the bucket.

Your Company’s Fireworks Service Account

  • Account: Your company’s Fireworks account registration email
  • Required role: Storage Object Viewer or Storage Object Admin
gcloud storage buckets add-iam-policy-binding <YOUR_BUCKET> \
  --member=serviceAccount:<YOUR_COMPANY_FW_ACCOUNT_EMAIL> \
  --role=roles/storage.objectViewer
This is used to validate that your account actually has access to the bucket that you are trying to reference the dataset from. The email associated with your account (not the email of the user, but the account itself, you can get it with firectl get account) must have at least read access to the bucket listed under the bucket access IAM policy.

Usage Example

1

Create a Proxy Dataset

Create a dataset that references your external GCS bucket:
firectl create dataset {DATASET_NAME} --external-url gs://bucket-name/object-name
Ensure your gsutil path points directly to the JSONL file. If the file is in a folder, make sure the folder contains only the intended file.
2

Start Fine-tuning

Use the proxy dataset to create a fine-tuning job:
firectl create sftj \
  --dataset "accounts/{ACCOUNT}/datasets/{DATASET_NAME}" \
  --base-model "accounts/fireworks/models/{MODEL}" \
  --output-model {TRAINED_MODEL_NAME}
For additional options, run: firectl create sftj -h

Key Benefits

Data Privacy

Your data never leaves your GCS bucket except during fine-tuning

Security

Access is limited to isolated fine-tuning clusters

Simplicity

Reference external data without copying or moving files