This is a template for n8n's evaluation feature.
Evaluation is a technique for getting confidence that your AI workflow performs reliably, by running a test dataset containing different inputs through the workflow.
By calculating a metric (score) for each input, you can see where the workflow is performing well and where it isn't.
This template shows how to calculate a workflow evaluation metric: text similarity, measured character-by-character.
The workflow takes images of hand-written codes, extracts the code and compares it with the expected answer from the dataset.
The images look like this:
The workflow works as follows: