Back to Templates
  • +6

Evaluation metric example: String similarity

Created by

David Roberts

Last update

Last update 12 days ago

Categories
Share

AI evaluation in n8n

This is a template for n8n's evaluation feature.

Evaluation is a technique for getting confidence that your AI workflow performs reliably, by running a test dataset containing different inputs through the workflow.

By calculating a metric (score) for each input, you can see where the workflow is performing well and where it isn't.

How it works

This template shows how to calculate a workflow evaluation metric: text similarity, measured character-by-character.

The workflow takes images of hand-written codes, extracts the code and compares it with the expected answer from the dataset.

The images look like this:

image

The workflow works as follows:

  • We use an evaluation trigger to read in our dataset
  • It is wired up in parallel with the regular trigger so that the workflow can be started from either one. More info
  • We download the image and use AI to extract the code
  • If we’re evaluating (i.e. the execution started from the evaluation trigger), we calculate the string distance metric
  • We pass this information back to n8n as a metric