Before launching, review Training Prerequisites & Validation for requirements, validation checks, and common errors.
When to use Web UI
Start with the UI to learn the options, then switch to CLI for faster iteration and automation.
| Feature | CLI (eval-protocol) | Web UI |
|---|---|---|
| Best for | Experienced users, automation | First-time users, exploration |
| Parameter discovery | Need to know flag names | Guided with tooltips |
| Speed | Fast - single command | Slower - multiple steps |
| Automation | Easy to script and reproduce | Manual process |
| Batch operations | Easy to launch multiple jobs | One at a time |
| Reproducibility | Excellent - save commands | Manual tracking needed |
Launch training via Web UI
1
Navigate to Fine-Tuning
- Go to Fireworks Dashboard
- Click Fine-Tuning in the left sidebar
- Click Fine-tune a Model

2
Select Reinforcement Fine-Tuning
- Choose Reinforcement as the tuning method
- Select your base model from the dropdown
Not sure which model to choose? Start with
llama-v3p1-8b-instruct for a good balance of quality and speed.3
Configure Dataset
- Upload new dataset or select existing from your account
- Preview dataset entries to verify format
- The UI validates your JSONL format automatically

messages array:4
Select Evaluator
- Choose from your uploaded evaluators
- Preview evaluator code and test results
- View recent evaluation metrics
For remote evaluators, you’ll enter your server URL in the environment configuration section.
5
Set Training Parameters
Configure how the model learns:Core parameters:
- Output model name: Custom name for your fine-tuned model
- Epochs: Number of passes through the dataset (start with 1)
- Learning rate: How fast the model updates (use default 1e-4)
- LoRA rank: Model capacity (8-16 for most tasks)
- Batch size: Training throughput (use default 32k tokens)
6
Configure Rollout Parameters
Control how the model generates responses during training:
- Temperature: Sampling randomness (0.7 for balanced exploration)
- Top-p: Probability mass cutoff (0.9-1.0)
- Top-k: Token candidate limit (40 is standard)
- Number of rollouts (n): Responses per prompt (4-8 recommended)
- Max tokens: Maximum response length (2048 default)
7
Review and Launch
- Review all settings in the summary panel
- See estimated training time and cost
- Click Start Fine-Tuning to launch