Back to Templates

AI Orchestrator: dynamically Selects Models Based on Input Type

Created by

Created by: Davide || n3witalia

Davide

Last update

Last update 11 hours ago

Share


This workflow is designed to intelligently route user queries to the most suitable large language model (LLM) based on the type of request received in a chat environment. It uses structured classification and model selection to optimize both performance and cost-efficiency in AI-driven conversations.

It dynamically routes requests to specialized AI models based on content type, optimizing response quality and efficiency.


Benefits

  • Smart Model Routing: Reduces costs by using lighter models for general tasks and reserving heavier models for complex needs.
  • Scalability: Easily expandable by adding more request types or LLMs.
  • Maintainability: Clear logic separation between classification, model routing, and execution.
  • Personalization: Can be integrated with session IDs for per-user memory, enabling personalized conversations.
  • Speed Optimization: Fast models like GPT-4.1 mini or Gemini Flash are chosen for tasks where speed is a priority.

How It Works

  1. Input Handling:

    • The workflow starts with the "When chat message received" node, which triggers the process when a chat message is received. The input includes the chat message (chatInput) and a session ID (sessionId).
  2. Request Classification:

    • The "Request Type" node uses an OpenAI model (gpt-4.1-mini) to classify the incoming request into one of four categories:
      • general: For general queries.
      • reasoning: For reasoning-based questions.
      • coding: For code-related requests.
      • google: For queries requiring Google tools.
    • The classification is structured using the "Structured Output Parser" node, which enforces a consistent output format.
  3. Model Selection:

    • The "Model Selector" node routes the request to one of four AI models based on the classification:
      • Opus 4 (Claude 4 Sonnet): Used for coding requests.
      • Gemini Thinking Pro: Used for reasoning requests.
      • GPT 4.1 mini: Used for general requests.
      • Perplexity: Used for search (Google-related) requests.
  4. AI Processing:

    • The selected model processes the request via the "AI Agent" node, which includes intermediate steps for complex tasks.
    • The "Simple Memory" node retains session context using the provided sessionId, enabling multi-turn conversations.
  5. Output:

    • The final response is generated by the chosen model and returned to the user.

Set Up Steps

  1. Configure Trigger:

    • Ensure the "When chat message received" node is set up with the correct webhook ID to receive chat inputs.
  2. Define Classification Logic:

    • Adjust the prompt in the "Request Type" node to refine classification accuracy.
    • Verify the output schema in the "Structured Output Parser" node matches expected categories (general, reasoning, coding, google).
  3. Connect AI Models:

    • Link each model node (Opus 4, Gemini Thinking Pro, GPT 4.1 mini, Perplexity) to the "Model Selector" node.
    • Ensure credentials (API keys) for each model are correctly configured in their respective nodes.
  4. Set Up Memory:

    • Configure the "Simple Memory" node to use the sessionId from the input for context retention.
  5. Test Workflow:

    • Send test inputs to verify classification and model routing.
    • Check intermediate outputs (e.g., request_type) to ensure correct model selection.
  6. Activate Workflow:

    • Toggle the workflow to "Active" in n8n after testing.

Need help customizing?

Contact me for consulting and support or add me on Linkedin.