Basics

What is reinforcement fine-tuning?

In traditional supervised fine-tuning, you provide a dataset with labeled examples showing exactly what the model should output. In reinforcement fine-tuning, you instead provide:

A dataset: Prompts, with input examples for the model to respond to
An evaluator: Code that scores the model’s outputs from 0.0 (bad) to 1.0 (good), also known as a reward function
An environment: The system where your agent runs, with access to tools, APIs, and data needed for your task

During training, the model generates responses to each prompt, receives scores from your reward function, and produces outputs that maximize the reward.

Use cases

Reinforcement fine-tuning helps you train models to excel at:

Code generation and analysis - Writing and debugging functions with verifiable execution results or test outcomes
Structured output generation - JSON formatting, data extraction, classification, and schema compliance with programmatic validation
Domain-specific reasoning - Legal analysis, financial modeling, or medical triage with verifiable criteria and compliance checks
Tool-using agents - Multi-step workflows where agents call external APIs with measurable success criteria

How it works

Design your evaluator

Define how you’ll score model outputs from 0 to 1. For example, scoring outputs higherchecking if your agent called the right tools, or if your LLM-as-judge rates the output highly.

Prepare dataset

Create a JSONL file with prompts (system and user messages). These will be used to generate rollouts during training.

Connect your environment

Train locally, or connect your environment as a remote server to Fireworks with our /init and /status endpoints.

Launch training

Create an RFT job via the UI or CLI. Fireworks orchestrates rollouts, evaluates them, and trains the model to maximize reward.

Deploy model

Once training completes, deploy your fine-tuned LoRA model to production with an on-demand deployment.

RFT works best when:

You can determine whether a model’s output is “good” or “bad,” even if only approximately
You have prompts but lack perfect “golden” completions to learn from
The task requires multi-step reasoning where evaluating intermediate steps is hard
You want the model to explore creative solutions beyond your training examples

Get Started

Deployments

Models & Inference

Fine Tuning

Administration

Security & Compliance

Integrations

What is reinforcement fine-tuning?

Use cases

How it works

RFT works best when:

Next steps

Create an evaluator

Kick off training

Get Started

Deployments

Models & Inference

Fine Tuning

Administration

Security & Compliance

Integrations

​What is reinforcement fine-tuning?

​Use cases

​How it works

​RFT works best when:

​Next steps

Create an evaluator

Kick off training

What is reinforcement fine-tuning?

Use cases

How it works

RFT works best when:

Next steps