Skip to main content
output (list): The model-generated list of tool calls.
expectedOutput (list): The reference list of expected tool calls (JSON-formatted objects with tool name and arguments).
Output
Result (float): A score between 0 and 1.
Reasoning (str): Optional detailed feedback on the matching process.
Interpretation
Higher scores (closer to 1) : Most expected tool calls were made correctly with proper parameters and order
Lower scores (closer to 0) : Few expected tool calls were matched correctly
T o o l C a l l A c c u r a c y = Number of correct tool calls Total expected tool calls \mathrm{Tool\ Call\ Accuracy} = \frac{\text{Number of correct tool calls}}{\text{Total expected tool calls}} Tool Call Accuracy = Total expected tool calls Number of correct tool calls
Use Cases
Evaluating agent compliance with required tool sequences
Assessing function-calling tasks that require specific arguments
Measuring multi-step tool-use workflows end-to-end