This workflow implements a cost-optimized AI routing system using n8n. It intelligently decides whether a request should be handled by a low-cost model or escalated to a higher-quality model based on response confidence.
The goal is to minimize LLM usage costs while maintaining high answer quality.
A query is first processed by a cheaper model. The response is then evaluated by a confidence-scoring AI agent. If the response quality is insufficient, the workflow automatically escalates the request to a more capable model.
This approach is useful for building scalable AI systems where most queries can be answered cheaply, while complex queries still receive high-quality responses.
Webhook Trigger
Workflow Configuration
Cheap Model Response
GPT-4o-mini to minimize cost.Confidence Evaluation
Structured Output Parsing
Decision Logic
Expensive Model Escalation
GPT-4o for a higher-quality answer.Cost Calculation
Final Response Formatting
Create an OpenAI credential in n8n.
Configure the following nodes:
Cheap Model (GPT-4o-mini)Expensive Model (GPT-4o)OpenAI Chat Model used by the confidence evaluator agent.Adjust configuration values in the Workflow Configuration node:
confidenceThresholdcheapModelCostPer1kTokensexpensiveModelCostPer1kTokensDeploy the workflow and send requests to the Webhook URL.
Example webhook payload:
{
"query": "Explain how photosynthesis works."
}