Getting started
Getting Started with Reward Functions
This guide will help you understand the basics of creating, testing, and deploying reward functions using the Reward Kit.
What is a Reward Function?
A reward function is a mechanism for evaluating the quality of model outputs in reinforcement learning from verifiable reward (RLVR) workflows. Reward functions help:
- Evaluate model responses based on specific criteria.
- Provide numerical scores that can be used to optimize models.
- Offer explanations for why specific scores were assigned.
Installation
To get started with Reward Kit, install it via pip:
For development, including running all examples and contributing to the codebase, install it in editable mode with development dependencies:
Authentication Setup
To use Reward Kit with the Fireworks AI platform, set up your authentication credentials:
Basic Reward Function Structure
Here’s a simple reward function that evaluates responses based on word count:
Testing and Evaluating
There are several ways to test your reward functions and run evaluations:
Programmatic Testing (for individual functions)
You can test your reward function directly in Python with sample conversations:
Local Evaluation with reward-kit run
(Recommended for datasets/examples)
For evaluating datasets or running complete examples, the primary method is the reward-kit run
CLI command. This uses Hydra for configuration, allowing you to define your dataset, model, and reward logic in YAML files.
-
Explore Examples: Check out the examples in the
examples/
directory at the root of the repository. The main Examples README provides an overview and guidance on their structure. Each example (e.g.,examples/math_example/
) has its own README explaining how to run it. -
Run an Example:
This command processes the dataset, generates model responses, applies reward functions, and saves detailed results.
Previewing Evaluation Outputs with reward-kit preview
After running an evaluation with reward-kit run
, a preview_input_output_pairs.jsonl
file is typically generated in the output directory. You can use reward-kit preview
to inspect these pairs or re-evaluate them with different metrics:
Refer to the Evaluation Workflows guide for a more detailed lifecycle overview.
Deploying Your Reward Function
When you’re ready, deploy your reward function to use in training workflows:
Or using the CLI:
Next Steps
Now that you have an overview of getting started:
- Dive deeper into Reward Function Anatomy.
- Understand the Core Data Types used in Reward Kit.
- Explore the Evaluation Workflows in more detail.
- Browse the Examples Overview and the main Examples README to find practical implementations.
- Follow our step-by-step tutorial for a hands-on walkthrough.