Evaluate RAG Response Accuracy with OpenAI: Document Groundedness Metric

Created by

Jimleuk

Last update

Last update 2 months ago

This evaluation works best for an agent that requires document retrieval from a vector store or similar source.
For our scoring, we need to collect the agent's response and the documents retrieved and use an LLM to assess if the former is based off the latter.
A key factor is to look out information in the response which is not mentioned in the documents.
A high score indicates LLM adherence and alignment whereas a low score could signal inadequate prompt or model hallucination.

There’s nothing you can’t automate with n8n