Evaluation Metric: Summarization

Created by

Jimleuk

Last update

Last update 2 months ago

This evaluation works best for an AI summarization workflows.
For our scoring, we simple compare the generated response to the original transcript.
A key factor is to look out information in the response which is not mentioned in the documents.
A high score indicates LLM adherence and alignment whereas a low score could signal inadequate prompt or model hallucination.

There’s nothing you can’t automate with n8n