
Fireworks is excited to announce our fine-tuning service to better enable customized models to be used with our blazing fast inference service! Fine-tuning is a crucial part of improving model accuracy on individual use cases. In the past year, we’ve seen huge usage of our “model deployment” service that allows users to upload models to Fireworks that were fine-tuned on other services. As part of Fireworks’ commitment to building the best production platform, we’re excited to offer fine-tuning directly on the Fireworks platform. Our service is focused on rapidly iterating on and seamlessly deploying fine-tuned models.
Fine-tuning is the process of providing many examples to an LLM to improve model performance. Specifically, Fireworks’ fine-tuning process uses a technique called LoRA that enables improved performance without requiring large amounts of data or reductions in model speed. Fine-tuning is helpful with a variety of tasks, especially those that require customization for specific formats or domains, like generating SQL code or responding with a specific tone.
Fine-tuning can offer huge increases in model accuracy and provide a “moat” for your app through differentiated quality. However, the process of fine-tuning can be painful, due to:
Fine-tuning on Fireworks offers

| $ / 1M tokens in training | |
|---|---|
| Models up to 16B parameters | $0.50 | 
| Models 16.1B - 80B | $3.00 | 
| Mixtral | $2.00 | 

Tuning a model is easy on Fireworks through our “Firectl” command line interface (see docs). Let’s say that you wanted to tune Mixtral-8x-7b-instruct to have a specific tone/accuracy for instruction following by using the databricks/databricks-dolly-15k dataset. First, you just prepare a dataset in a JSONL file - we even provide code (see documentation) to convert any dataset to a JSONL file. The data would appear like so:
1234
Next, you provide fine-tuning settings to specify which model to use, how the model should format/interpret the training data and hyperparameter settings. We enable 2 hyperparameters to be changed - epochs and learning rate. You can also connect your fine-tuning job to Weights and Biases to view progress visually in their interface.
12345678910111213
Afterwards, you simply provide your data and your settings in the Firectl to start a fine-tuning job and will be provided with a model to deploy and use seamlessly with the Fireworks inference service.
Fine-tuning is a powerful tool that enables you to quickly provide improved accuracy to your users. Fireworks makes it faster and easier to tune and serve models. Get started today with our fine-tuning docs (link). Tune, deploy and compare fine-tuned models within minutes with Fireworks’ near-instantaneous deployment of fine-tuned models. When you’ve found a model you like, serve it fast and cost-effectively at scale through the Fireworks inference engine.
We’ll be investing further in our tuning service in the coming months, with planned features like special fine-tuning options for conversational formats and function calling. We’d love to hear your feedback! Please join the #fine-tuning channel of our Discord to chat directly with the team or other fine-tuning users. We’re excited to see what you build!