examples/trl_integration/
directory.
examples/trl_integration/
examples/trl_integration/
directory contains several Python scripts:
grpo_example.py
: Demonstrates using reward functions with the Group Relative Policy Optimization (GRPO) trainer from TRL. This is a key example showing:
ppo_example.py
: Likely demonstrates integration with TRL’s Proximal Policy Optimization (PPO) trainer.minimal_deepcoder_grpo_example.py
: A more focused GRPO example, possibly related to the DeepCoder dataset or a simplified setup.working_grpo_example.py
: Another GRPO variant, perhaps a more tested or stable version.convert_dataset_to_jsonl.py
: A utility script for dataset preparation.trl_adapter.py
: Contains adapter logic, likely used by the example scripts.test_trl_integration.py
: Pytest file for testing the integration.source .venv/bin/activate
).
1. GRPO Example (grpo_example.py
)
This example demonstrates using reward functions with the Group Relative Policy Optimization (GRPO) trainer from TRL. It shows:
ppo_example.py
, minimal_deepcoder_grpo_example.py
)
get_trl_adapter()
method that converts any reward function into the format expected by TRL. This makes it easy to use existing reward functions from reward-kit with TRL trainers.