- Aligning model outputs with brand voice, tone, or style guidelines
- Reducing hallucinations or incorrect reasoning patterns
- Improving response quality where there’s no single “correct” answer
- Teaching models to follow specific formatting or structural preferences
Fine-tuning with DPO
1
Prepare dataset
Datasets must adhere strictly to the JSONL format, where each line represents a complete JSON-formatted training example.Minimum Requirements:Save this dataset as jsonl file locally, for example
- Minimum examples needed: 3
- Maximum examples: Up to 3 million examples per dataset
- File format: JSONL (each line is a valid JSON object)
- Dataset Schema: Each training sample must include the following fields:
- An
inputfield containing amessagesarray, where each message is an object with two fields:role: one ofsystem,user, orassistantcontent: a string representing the message content
- A
preferred_outputfield containing an assistant message with an ideal response - A
non_preferred_outputfield containing an assistant message with a suboptimal response
- An
einstein_dpo.jsonl
We currently only support one-turn conversations for each example, where the preferred and non-preferred messages need to be the last assistant message.
einstein_dpo.jsonl.2
Create and upload the dataset
There are a couple ways to upload the dataset to Fireworks platform for fine tuning: While all of the above approaches should work,
firectl, Restful API , builder SDK or UI.- UI
- firectl
- Restful API
-
You can simply navigate to the dataset tab, click
Create Datasetand follow the wizard.
UI is more suitable for smaller datasets < 500MB while firectl might work better for bigger datasets.Ensure the dataset ID conforms to the resource id restrictions.3
Create a DPO Job
Simple use for our example, we might run the following command:to fine-tune a Llama 3.1 8b Instruct model with our Einstein dataset.
firectl to create a new DPO job:4
Monitor the DPO Job
Use Once the job is complete, the
firectl to monitor progress updates for the DPO fine-tuning job.STATE will be set to JOB_STATE_COMPLETED, and the fine-tuned model can be deployed.5
Deploy the DPO fine-tuned model
Once training completes, you can create a deployment to interact with the fine-tuned model. Refer to deploying a fine-tuned model for more details.
Next Steps
Explore other fine-tuning methods to improve model output for different use cases.Supervised Fine Tuning - Text
Train models on input-output examples to improve task-specific performance.
Reinforcement Fine Tuning
Optimize models using AI feedback for complex reasoning and decision-making.
Supervised Fine Tuning - Vision
Fine-tune vision-language models to understand both images and text.