Developing Evaluators
The Build SDK natively integrates reward-kit to make it easy to develop Evaluators for RFT in Python.
Prerequisites
You can install the Fireworks Build SDK using pip:
Make sure to set the FIREWORKS_API_KEY
environment variable to your Fireworks API key:
You can create an API key in the Fireworks AI web UI or by installing the firectl CLI tool and running:
Your first evaluator
For this tutorial, we’ll create a new project using uv
.
You should now have a project with a pyproject.toml
file and a uv.lock
file.
To create your first evaluator, create a new file at my_first_evaluator/main.py
:
Evaluators must be in their own directory because the Build SDK automatically recursively packages all sibling and child files from the directory containing the imported reward function.
Add the following code to my_first_evaluator/main.py
:
To test your evaluator locally, you can simply call the function itself. Replace the contents of main.py
with the following code:
Let’s run the script and see what happens:
You should see that the first message returns a score of 0.0 and the second message returns a score of 1.0, showing that our evaluator is working as expected.
Evaluating on a dataset
Now that we’ve created and tested our first evaluator, we can use it to evaluate on Fireworks infrastructure using a dataset uploaded on Fireworks.
To do this, we’ll create a
Dataset object and call
create_evaluation_job
. Create a new file called run_first_evaluator.py
at
the root of your project and add the following code:
Let’s run the script and see what happens:
When the script first runs, you should see a URL for the evaluation job. You can go to the URL to see the evaluation job in the Fireworks AI web UI.
Running evaluation job in the UI
After some time, the evaluation job will be completed and you should see a URL for the output dataset. You can go to the URL to see the results in the Fireworks AI web UI.
Completed evaluation job in the UI
After the job is completed, the script will also print the URL for the output dataset.
You can go to the URL to see the output dataset in the Fireworks AI web UI.
Results in the UI
Creating your second evaluator
Let’s create a more complex evaluator that imports a third-party library to calculate the score. Let’s add the textblob
library to our project:
The Build SDK will automatically pick up dependencies found from pyproject.toml
or requirements.txt
files in your project. Alternatively you can specify a
list of strings as you would in a requirements.txt
file directly in the
@reward_function
decorator itself.
Now, let’s create a new evaluator under my_second_evaluator/main.py
:
Copy-paste the following code into my_second_evaluator/main.py
:
Download the
random_phrases.jsonl
file and save it to the root of your project. The random_phrases.jsonl
file
should be at the root of your project like this:
Create a new file called run_second_evaluator.py
and add the following code:
Once the script is done running, you can click on the URL for the evaluation job and see the results in the Fireworks AI web UI.
Results of the second evaluator in the UI
🎉 Congratulations! You’ve now created and evaluated your first two evaluators. If you have any questions, please reach out to us on Discord.