DeepSeek R1 0528, an updated version of the state-of-the-art DeepSeek R1 model, is now available. Try it now!

Reinforcement Fine Tuning

Train open models with your own Python evaluator. Fireworks handles the rest—so you get the highest quality model for your use case.

RL Tuning
RL Training

Train expert open models with just a few examples

With RFT, open models can match or even surpass the quality of closed frontier models, while running up to 10× faster. You only need to specify an evaluator function that grades model outputs and provide a few labeled examples—no infrastructure setup required.

RFT
Quality

Frontier model quality across key use cases

Use RFT to train models that execute accurate function calls, generate clean, compilable code, outperform base models in creative writing judged by LLMs, and solve math problems with over 90% accuracy using reward shaping.

Reward
Reward Functions

Build Custom Reward Functions

Use Reward-kit to define exactly how model outputs should be scored—match function calls, validate code execution, or use an LLM as judge. Write custom evaluators in Python, explore ready-made examples, and contribute your own to the open-source library.