Route AI queries cost‑efficiently with GPT‑4o‑mini, GPT‑4o and confidence scoring

Created by

Last update

Last update 2 days ago

How It Works

Webhook Trigger
- Receives a user query from an external application.
Workflow Configuration
- Defines parameters such as:
  - confidence threshold
  - cheap model cost
  - expensive model cost
Cheap Model Response
- The query is first processed using GPT-4o-mini to minimize cost.
Confidence Evaluation
- An AI agent analyzes the response quality.
- It evaluates accuracy, completeness, clarity, and relevance.
Structured Output Parsing
- The evaluator returns structured data including:
  - confidence score
  - explanation
  - escalation recommendation.
Decision Logic
- If the confidence score is below the configured threshold, the workflow escalates the request.
Expensive Model Escalation
- The query is reprocessed using GPT-4o for a higher-quality answer.
Cost Calculation
- Token usage is analyzed to estimate:
  - total cost
  - cost difference between models.
Final Response Formatting
- The workflow returns:
  - AI response
  - model used
  - confidence score
  - escalation status
  - estimated cost.

Create an OpenAI credential in n8n.
Configure the following nodes:
- Cheap Model (GPT-4o-mini)
- Expensive Model (GPT-4o)
- OpenAI Chat Model used by the confidence evaluator agent.
Adjust configuration values in the Workflow Configuration node:
- confidenceThreshold
- cheapModelCostPer1kTokens
- expensiveModelCostPer1kTokens
Deploy the workflow and send requests to the Webhook URL.

Example webhook payload:

{
  "query": "Explain how photosynthesis works."
}