Metrics
Metrics define how model performance is measured and evaluated.Custom Logic
Define custom evaluation functions that match your specific business requirements.
Flexible Inputs
Accept any input format and compare against expected outputs flexibly.
Aggregation Support
Aggregate individual scores across datasets for comprehensive evaluation.
Optimization Ready
Use metrics directly with Tune for automatic prompt optimization.
Creating Metrics
Choose from four available options when creating metrics:Auto

- Dataset with defined schema containing the fields you want to compare
- Select specific fields by clicking the “Select” button next to each field name
Code

- Define a function called
metric_func(output, expected)
that returns a float value (typically 0.0 or 1.0) - Replace
'field_name'
placeholders with your actual field names - Function must handle None/missing values and return appropriate scores
Existing

- At least one metric must already exist in your project
- Select the desired metric from the list by checking the checkbox
LLM

- Write evaluation criteria and instructions in the text area
- Your prompt must instruct the LLM to return either ‘true’ or ‘false’ in its response
- LLM judge receives both the model output and expected output for comparison
Optimize Prompts
Let Tune automatically improve prompts based on your metrics