This workflow is designed to intelligently route user queries to the most suitable large language model (LLM) based on the type of request received in a chat environment. It uses structured classification and model selection to optimize both performance and cost-efficiency in AI-driven conversations.
It dynamically routes requests to specialized AI models based on content type, optimizing response quality and efficiency.
Benefits
- Smart Model Routing: Reduces costs by using lighter models for general tasks and reserving heavier models for complex needs.
- Scalability: Easily expandable by adding more request types or LLMs.
- Maintainability: Clear logic separation between classification, model routing, and execution.
- Personalization: Can be integrated with session IDs for per-user memory, enabling personalized conversations.
- Speed Optimization: Fast models like
GPT-4.1 mini
or Gemini Flash
are chosen for tasks where speed is a priority.
How It Works
-
Input Handling:
- The workflow starts with the "When chat message received" node, which triggers the process when a chat message is received. The input includes the chat message (
chatInput
) and a session ID (sessionId
).
-
Request Classification:
- The "Request Type" node uses an OpenAI model (
gpt-4.1-mini
) to classify the incoming request into one of four categories:
general
: For general queries.
reasoning
: For reasoning-based questions.
coding
: For code-related requests.
google
: For queries requiring Google tools.
- The classification is structured using the "Structured Output Parser" node, which enforces a consistent output format.
-
Model Selection:
- The "Model Selector" node routes the request to one of four AI models based on the classification:
- Opus 4 (Claude 4 Sonnet): Used for
coding
requests.
- Gemini Thinking Pro: Used for
reasoning
requests.
- GPT 4.1 mini: Used for
general
requests.
- Perplexity: Used for
search
(Google-related) requests.
-
AI Processing:
- The selected model processes the request via the "AI Agent" node, which includes intermediate steps for complex tasks.
- The "Simple Memory" node retains session context using the provided
sessionId
, enabling multi-turn conversations.
-
Output:
- The final response is generated by the chosen model and returned to the user.
Set Up Steps
-
Configure Trigger:
- Ensure the "When chat message received" node is set up with the correct webhook ID to receive chat inputs.
-
Define Classification Logic:
- Adjust the prompt in the "Request Type" node to refine classification accuracy.
- Verify the output schema in the "Structured Output Parser" node matches expected categories (
general
, reasoning
, coding
, google
).
-
Connect AI Models:
- Link each model node (Opus 4, Gemini Thinking Pro, GPT 4.1 mini, Perplexity) to the "Model Selector" node.
- Ensure credentials (API keys) for each model are correctly configured in their respective nodes.
-
Set Up Memory:
- Configure the "Simple Memory" node to use the
sessionId
from the input for context retention.
-
Test Workflow:
- Send test inputs to verify classification and model routing.
- Check intermediate outputs (e.g.,
request_type
) to ensure correct model selection.
-
Activate Workflow:
- Toggle the workflow to "Active" in n8n after testing.
Need help customizing?
Contact me for consulting and support or add me on Linkedin.