Fine-tuning adapts general-purpose models to domain-specific tasks, significantly improving performance in real-world applications. In particular, fine-tuning can offer you:

  • Increased accuracy on specific tasks or reasoning in a specific domain.
  • Better performance and lower costs from using a smaller model.

For example, we have seen fine-tuning be especially helpful in these tasks:

  • Low-latency query understanding, summarization, and classification
  • Cost-efficient vision and text understanding
  • Document intelligence in specialized domains
  • Function calling for agentic applications

Fine-tuning on Fireworks

Fireworks supports both Supervised Fine-Tuning (SFT) and Reinforcement Fine Tuning (RFT). In supervised fine-tuning, you provide a dataset with labeled examples of “good” outputs. In reinforcement fine-tuning, you provide a grader function that can be used to score the model’s outputs. The model is iteratively trained to produce outputs that maximize this score. To learn more about the differences between SFT and RFT, see when to use Supervised Fine-Tuning (SFT) vs. Reinforcement Fine-Tuning (RFT).

To fine-tune a model efficiently, Fireworks uses a technique called Low-Rank Adaptation (LoRA). The fine-tuning process generates a LoRA addon that gets deployed onto a base model at inference time. The advantages of using LoRA are:

  • Models are faster and cheaper to train
  • Models are seamless to deploy on Fireworks
  • We can configure deployments to allow multiple LoRAs to be deployed onto them

When to use Supervised Fine-Tuning (SFT) vs. Reinforcement Fine-Tuning (RFT)

Supervised fine-tuning (SFT) works well for many common scenarios, especially when:

  • You have a sizable dataset (~1000+ examples) with high-quality, ground-truth lables.
  • The dataset covers most possible input scenarios.
  • Tasks are relatively straightforward, such as:
    • Classification
    • Content extraction

However, SFT may struggle in situations where:

  • Your dataset is small.
  • You lack ground-truth outputs (a.k.a. “golden generations”).
  • The task requires multi-step reasoning.

Here is a simple decision tree:

Verifiable refers to whether it is relatively easy to make a judgement on the quality of the model generation.