examples/tool_calling_example/
for evaluating and training models for tool/function calling capabilities. These examples primarily use Hydra for configuration.
examples/tool_calling_example/
directory contains scripts for:
local_eval.py
): Evaluating a model’s ability to make tool calls against a dataset.trl_grpo_integration.py
): Fine-tuning a model for tool calling using TRL (Transformer Reinforcement Learning) with Group Relative Policy Optimization (GRPO).dataset.jsonl
is provided in the example directory. For tool calling tasks, each entry in the dataset typically includes:
messages
: A list of conversation messages.tools
: A list of tool definitions available to the model.ground_truth
: The expected assistant response, which might include tool calls (e.g., {"role": "assistant", "tool_calls": [...]}
) or a direct content response.reward-kit
and its development dependencies installed:
trl_grpo_integration.py
):
local_eval.py
if not using a local model, or for downloading a base model for TRL), ensure necessary keys like FIREWORKS_API_KEY
are set.local_eval.py
)examples/tool_calling_example/conf/local_eval_config.yaml
.examples/tool_calling_example/dataset.jsonl
.local_eval_config.yaml
or passed as CLI overrides if local_eval.py
is structured to accept them via Hydra.local_eval_config.yaml
as ./outputs/local_eval_tool_calling/${now:%Y-%m-%d}/${now:%H-%M-%S}
).
trl_grpo_integration.py
)trl_grpo_integration.py
and potentially conf/trl_grpo_config.yaml
.
examples/tool_calling_example/conf/trl_grpo_config.yaml
.dataset_file_path
: dataset.jsonl
(assumed to be in examples/tool_calling_example/
).model_name
: Qwen/Qwen2-0.5B-Instruct
.grpo
training parameters.examples/tool_calling_example/trl_grpo_integration.py
to load your desired Hugging Face model and tokenizer (remove or conditionalize the mock model parts).conf/trl_grpo_config.yaml
with the correct model_name
and adjust training parameters.use_mock_model_tokenizer
in the script/config, you might run:
trl_grpo_config.yaml
as ./outputs/trl_grpo_tool_calling/${now:%Y-%m-%d}/${now:%H-%M-%S}
).
For more general information on Hydra, see the Hydra Configuration for Examples guide.