Route AI tasks between Anthropic Claude models with Postgres policies and SLA

Created by

Last update

Last update 21 days ago

Overview

This workflow implements a policy-driven LLM orchestration system that dynamically routes AI tasks to different language models based on task complexity, policies, and performance constraints.

Instead of sending every request to a single model, the workflow analyzes each task, applies policy rules, and selects the most appropriate model for execution. It also records telemetry data such as latency, token usage, and cost, enabling continuous optimization.

A built-in self-tuning mechanism runs weekly to analyze historical telemetry and automatically update routing policies. This allows the system to improve cost efficiency, performance, and reliability over time without manual intervention.

This architecture is useful for teams building AI APIs, agent platforms, or multi-model LLM systems where intelligent routing is needed to balance cost, speed, and quality.

How It Works

Webhook Task Input
- The workflow begins when a request is sent to the webhook endpoint.
- The request contains a task and optional priority metadata.
Task Classification
- A classifier agent analyzes the task and categorizes it into:
  - extraction
  - classification
  - reasoning
  - generation
- The agent also returns a confidence score.
Policy Engine
- Policy rules are loaded from a database.
- These rules define execution constraints such as:
  - preferred model size
  - latency limits
  - token budgets
  - retry strategies
  - cost ceilings.
Model Routing
- A decision engine evaluates classification results and policy rules.
- Tasks are routed to either a small model (fast and cost-efficient) or a large model (higher reasoning capability).
Task Execution
- The selected LLM processes the task and generates the response.
Telemetry Collection
- Execution metrics are captured including:
  - latency
  - tokens used
  - estimated cost
  - model used
  - success status.
- These metrics are stored in a database.
Weekly Self-Optimization
- A scheduled workflow analyzes telemetry from the past 7 days.
- If performance trends change, routing policies are automatically updated.

Setup Instructions

Configure a Postgres database
- Create two tables:
  - policy_rules
  - telemetry
Add LLM credentials
- Configure Anthropic credentials for the language model nodes.
Configure policy rules
- Define preferred models, cost limits, and latency thresholds in the policy_rules table.
Configure workflow settings
- Adjust parameters in the Workflow Configuration node:
  - maximum latency
  - cost ceiling
  - token limits
  - retry behavior.
Deploy the API endpoint
- Send requests to the webhook endpoint:

Use Cases

AI API Gateway

Route requests to different models based on complexity and cost constraints.

Multi-Model AI Platforms

Automatically choose the best model for each task without manual configuration.

Cost-Optimized AI Systems

Prefer smaller models for simple tasks while reserving larger models for complex reasoning.

LLM Observability

Track token usage, latency, and cost for each AI request.

Self-Optimizing AI Infrastructure

Automatically improve routing policies using real execution telemetry.

Requirements

n8n with LangChain nodes enabled
Postgres database
Anthropic API credentials
Tables:
policy_rules
telemetry

Optional:

Monitoring dashboards connected to telemetry data
External policy management systems