Supervised Fine-Tuning (SFT) adapts general-purpose models to domain-specific tasks, significantly improving performance in real-world applications. Fireworks’ fine-tuning service is easy to use, and supports continued training from another fine-tuned model. Fine-tuned models can be seamlessly deployed for inference and multi-LoRA serving allows multiple fine-tuned models to run simultaneously on a single deployment. You can run your supervised fine-tuning job via CLI, API or UI.

We’re introducing an upgraded tuning service with improved speed, usability and reliability! The new service utilizes different commands and model coverage. The new service is offered for free as we’re in public preview.

Benefits of fine-tuning:

  • Higher Accuracy: Fine-tuning helps the model better match the dataset, boosting precision and performance.
  • Better Fit for Specific Domains: Adapting general models with domain-specific data makes them more effective for specialized tasks.
  • Less Bias: Using diverse, curated datasets during fine-tuning reduces built-in biases for fairer results.
  • Up-to-Date Knowledge: Fine-tuning with new data keeps the model aligned with the latest information.

Fireworks uses LoRA-based fine-tuning to reduce the computational cost of fine-tuning large models by updating only a small subset of parameters in a low‑rank structure. For models with 70B or more parameters, qLoRA (quantized) to improve training speeds.

Impact on inference speed

For fast inference speeds, the fine-tuned LoRA should be merged into the base model. Note that fine-tuned model inference on Serverless is slower than base model inference on Serverless.

Fine-tuning a model

  1. Enhanced Precision: The model can adapt to the unique attributes and trends within a dataset, leading to significantly improved precision and effectiveness.
  2. Domain Adaptation: While many models are developed with general data, fine-tuning them with specialized, domain-specific datasets ensures they are finely attuned to the specific requirements of that field.
  3. Bias Reduction: General models may carry inherent biases. Fine-tuning with a well-curated, diverse dataset aids in reducing these biases, fostering fairer and more balanced outcomes.
  4. Contemporary Relevance: Information evolves rapidly, and fine-tuning with the latest data keeps the model current and relevant.
  5. Customization for Specific Applications: The model can be tailored to meet unique objectives and needs not achievable with standard models.

In essence, fine-tuning a model with a specific dataset is a pivotal step in ensuring its enhanced accuracy, relevance, and suitability for specific applications. Let’s try fine-tuning a model!

Fine-tuned model inference on Serverless is slower than base model inference on Serverless. For use cases that need low latency, we recommend using on-demand deployments. For on-demand deployements, fine-tuned model inference speeds are significant closer to base model speeds (but still slightly slower). If you are only using 1 LoRA on-demand, merging fine-tuned weights into the base model when using on-demand deployments will provide identical speed to base model inference. If you have an enterprise use case that needs fast fine-tuned models, please contact us!

Step 1: Check Available Models for Fine-Tuning

In the model library page, select the Tunable filter. Alternately, in the model page, check whether the “Fine Tuning” field is set to “supported” in the model’s details page.

Our new tuning service is currently free but will eventually be charged based on the total number of tokens processed (dataset_tokens * num_empochs). Running inference on fine-tuned models incurs no extra costs outside of base inference fees.

Step 2: Prepare the Dataset

Datasets must be in JSONL format, where each line represents a complete JSON-formatted training example.

  • Minimum examples needed: 3
  • Maximum examples: Up to 3 million examples per dataset
  • File format: JSONL (each line is a valid JSON object)
  • Message Schema: Each training sample must include a messages array, where each message is an object with two fields:
    • role: one of system, user, or assistant. A message with the “system” role is optional, but if specified, it must be the first message of the conversation
    • content: a string representing the message content

This format conforms to OpenAI’s Chat Completions API

Here is an example conversation dataset:

{"messages": [
  {"role": "system", "content": "You are a helpful assistant."},
  {"role": "user", "content": "What is the capital of France?"},
  {"role": "assistant", "content": "Paris."}
]},
{"messages": [
  {"role": "user", "content": "What is 1+1?"}, 
  {"role": "assistant", "content": "2"}, 
  {"role": "user", "content": "Now what is 2+2?"}, 
  {"role": "assistant", "content": "4"}
]}

Step 3: Create and Upload the Dataset

Create and check the dataset via CLI and API

firectl create dataset <DATASET_ID> path/to/training_dataset.jsonl
firectl get dataset <DATASET_ID>

Step 4: Creating a Fine-Tuning Job

Using CLI: To start a structured fine-tuning job (sftj), run the command below. This will also return the fine-tuning job ID.

firectl create sftj --base-model <MODEL_ID> --dataset <dataset_name> --output-model <desired_model_id>

For example:

To start a structured fine-tuning job (sftj), run the following bash/ Python code

firectl create sftj --base-model llama-v3p1-8b-instruct --dataset my_dataset --output-model my_model

firectl will return the fine-tuning job ID.

Step 5. Checking the job status

You can monitor the progress of the tuning job by running

firectl get sftj <JOB_ID>

Once the job successfully completes, a model will be created in your account. You can see a list of models, and then you use this id to monitor the status of the job::

firectl list models
firectl get model <MODEL_ID>

Continue training from a fine-tuned model

When creating a fine-tuning job, you can start tuning from a base model, or from a fine-tuned model you tuned earlier:

  1. Base model: Use the base-model parameter in CLI (or baseModel in API) to start from a pre-trained base model.
  2. Existing LoRA add-on: Use the warm-start-from parameter in CLI (or warmStartFrom parameter in API) to start from an existing LoRA addon model, where the LoRA is specified with the format “accounts/<account-id>/models/<addon-model-id>”

You must specify either base-model or warm-start-from parameter.

Deploying and using a model

Before using your fine-tuned model for inference, you must deploy it. Please refer to our guides on Deploying a model and Querying text models for detailed instructions.

Some base models may not support serverless addons. To check:

  1. Run firectl -a fireworks get <BASE_MODEL_ID>
  2. Look under Deployed Model Refs to see if a Fireworks-owned deployment exists, e.g. accounts/fireworks/deployments/3c7a68b0
  3. If so, then it is supported

If the base model doesn’t support serverless addons, you will need use an on-demand deployment to deploy it.

Additional tuning options

Tuning settings are specified when starting a fine-tuning job. All of the below settings are optional and will have reasonable defaults if not specified. For settings that affect tuning quality like epochs and learning rate, we recommend using default settings and only changing hyperparameters if results are not as desired. All tuning options must be specified via command line flags as shown in the below example:

firectl create sftj \
--base-model llama-v3p1-8b-instruct \
--dataset cancerset \
--output-model my-tuned-model \
--job-id my-fine-tuning-job \
--learning-rate 0.0001 \
--epochs 2 \
--early-stop \
--evaluation-dataset my-eval-set

Evaluation

By default, the fine-tuning job will run evaluation by running the fine-tuned model against an evaluation set that’s created by automatically carving out a portion of your training set. You have the option to explicitly specify a separate evaluation dataset to use instead of carving out training data.

evaluation_dataset: The ID of a separate dataset to use for evaluation. Must be pre-uploaded via firectl

firectl create sftj \
 ...
  --evaluation-dataset my-eval-set \
  ...

Early stopping

Early stopping stops training early in the validation loss does not improve. It is off by default

firectl create sftj \
 ...
  --early-stop \
  ...

Max Context Length

By default, fine-tuned models support a max context length of 8k. Increase max context length if your use case requires context above 8k. Maximum context length can be increased up to the default context length of your selected model. For models with over 70B parameters, we only support up to 65536 max context length.

firectl create sftj \
 ...
  --max-context-length 65536
  ...

Epochs

Epochs are the number of passes over the training data. Our default value is 1. If the model does not follow the training data as much as expected, increase the number of epochs by 1 or 2. Non-integer values are supported.

Note: we set a max value of 3 million dataset examples * epochs

firectl create sftj \
 ...
  --epochs 2.0 \
  ...

Learning rate

Learning rate controls how fast the model updates from data. We generally do not recommend changing learning rate. The default value set is automatically based on your selected model.

firectl create sftj \
  ...
  --learning-rate 0.0001 \
  ...

Lora Rank

LoRA rank refers to the number of parameters that will be tuned in your LoRA add-on. Higher LoRA rank increases the amount of information that can be captured while tuning. LoRA rank must be a power of 2 up to 64. Our default value is 8.

firectl create sftj \
...
  --lora-rank 16 \
  ...

Training progress and monitoring

The fine-tuning service integrates with Weights & Biases to provide observability into the tuning process. To use this feature, you must have a Weights & Biases account and have provisioned an API key.

firectl create sftj \
 ...
  --wandb-entity my-org \
  --wandb-api-key xxx \
  --wandb-project "My Project" \
  ...

Model ID

By default, the fine-tuning job will generate a random unique ID for the model. This ID is used to refer to the model at inference time. You can optionally specify a custom ID, within ID constraints.

firectl create sftj \
  ...
  --output-model-id my-model \
  ...

Job ID

By default, the fine-tuning job will generate a random unique ID for the fine-tuning job. You can optionally choose a custom ID.

firectl create sftj \
  ...
  --job-id my-fine-tuning-job \
  ...

Turbo Mode

By default, the fine-tuning job will use a single GPU. You can optionally enable the turbo mode to accelerate with multiple GPUs (only for non-Deepseek models)

firectl create sftj \
  ...
  --turbo \
  ...

Downloading model weights

To download model weights run

firectl download model <model-id> <target local filepath>

Appendix

Supported base models - tuning

Using UI: In the model library page, select the Tunable filter. In the model page, check whether the “fine-tuning” field is set to “supported” in the model’s details page.

All models available for tuning also support LoRAs on their dedicated deployments, meaning that up to 100 LoRAs can be deployed to a dedicated instance for no extra fees compared to the base deployment costs. Some models support LoRAs on dedicated deployments even though Fireworks does not support tuning for these models. This means that users can tune a LoRA on a separate platform but upload this LoRA to Fireworks for inference.

Supported base models - LoRAs on serverless

Some serverless models support LoRA deployment, allowing up to 100 LoRAs can be deployed for inference that’s constantly available on a pay-per-token basis. The field for “Serverless LoRA Deployment” will be set to “supported” for these models in their model details page.