Tool Calling Example
This guide explains how to use the examples inexamples/tool_calling_example/ for evaluating and training models for tool/function calling capabilities. These examples primarily use Hydra for configuration.
Overview
Theexamples/tool_calling_example/ directory contains scripts for:
- Local Evaluation (
local_eval.py): Evaluating a model’s ability to make tool calls against a dataset. - TRL GRPO Integration (
trl_grpo_integration.py): Fine-tuning a model for tool calling using TRL (Transformer Reinforcement Learning) with Group Relative Policy Optimization (GRPO).
dataset.jsonl is provided in the example directory. For tool calling tasks, each entry in the dataset typically includes:
messages: A list of conversation messages.tools: A list of tool definitions available to the model.ground_truth: The expected assistant response, which might include tool calls (e.g.,{"role": "assistant", "tool_calls": [...]}) or a direct content response.
Setup
- Environment: Ensure your Python environment has
reward-kitand its development dependencies installed: - TRL Extras (for
trl_grpo_integration.py): - API Keys: If using models that require API keys (e.g., Fireworks AI models for
local_eval.pyif not using a local model, or for downloading a base model for TRL), ensure necessary keys likeFIREWORKS_API_KEYare set.
1. Local Evaluation (local_eval.py)
This script performs local evaluation of a model’s tool calling.
Configuration
- Uses Hydra and is configured by
examples/tool_calling_example/conf/local_eval_config.yaml. - The default configuration points to
examples/tool_calling_example/dataset.jsonl. - The script itself likely contains defaults for the model and reward function, or expects them as CLI overrides.
How to Run
- Activate your virtual environment:
- Execute from the repository root:
Overriding Parameters
- Change dataset path:
- Other parameters (e.g., model name, reward function parameters) would typically be added to
local_eval_config.yamlor passed as CLI overrides iflocal_eval.pyis structured to accept them via Hydra.
local_eval_config.yaml as ./outputs/local_eval_tool_calling/${now:%Y-%m-%d}/${now:%H-%M-%S}).
2. TRL GRPO Integration (trl_grpo_integration.py)
This script provides a scaffold for fine-tuning a model for tool calling using TRL GRPO.
Note: The script defaults to using a MOCK model and tokenizer. Using a real model requires code modifications in trl_grpo_integration.py and potentially conf/trl_grpo_config.yaml.
Configuration
- Uses Hydra and is configured by
examples/tool_calling_example/conf/trl_grpo_config.yaml. - Default
dataset_file_path:dataset.jsonl(assumed to be inexamples/tool_calling_example/). - Default
model_name:Qwen/Qwen2-0.5B-Instruct. - Includes various
grpotraining parameters.
How to Run (with Mock Model by default)
- Activate your virtual environment:
- Execute from the repository root:
Overriding Parameters
- Change dataset path or training epochs:
Using a Real Model (Requires Code Changes)
- Modify
examples/tool_calling_example/trl_grpo_integration.pyto load your desired Hugging Face model and tokenizer (remove or conditionalize the mock model parts). - Ensure the prompt formatting in the script is suitable for your chosen model.
- Update
conf/trl_grpo_config.yamlwith the correctmodel_nameand adjust training parameters. - Run the script. If you added a flag like
use_mock_model_tokenizerin the script/config, you might run:
trl_grpo_config.yaml as ./outputs/trl_grpo_tool_calling/${now:%Y-%m-%d}/${now:%H-%M-%S}).
For more general information on Hydra, see the Hydra Configuration for Examples guide.