Transform your Telegram messenger into a powerful, multi-modal personal or team assistant. This n8n workflow creates an intelligent agent that can understand text, voice, images, and documents, and take action by connecting to your favorite tools like Google Calendar, Gmail, Todoist, and more.
At its core, a powerful Manager Agent, driven by Google Gemini, interprets your requests, orchestrates a team of specialized sub-agents, and delivers a coherent, final response, all while maintaining a persistent memory of your conversations.
Key Features
🧠 Intelligent Automation: Uses Google Gemini as a central "Manager Agent" to understand complex requests and delegate tasks to the appropriate tool.
🗣️ Multi-Modal Input: Interact naturally by sending text, voice notes, photos, or documents directly into your Telegram chat.
🔌 Integrated Toolset: Comes pre-configured with agents to manage your memory, tasks, emails, calendar, research, and project sheets.
🗂️ Persistent Memory: Leverages Airtable as a knowledge base, allowing the assistant to save and recall personal details, company information, or past conversations for context-rich interactions.
⚙️ Smart Routing: Automatically detects the type of message you send and routes it through the correct processing pipeline (e.g., voice is transcribed, images are analyzed).
🔄 Conversational Context: Utilizes a window buffer to maintain short-term memory, ensuring follow-up questions and commands are understood within the current conversation.
The Telegram Trigger node acts as the entry point, receiving all incoming messages (text, voice, photo, document).
A Switch node intelligently routes the message based on its type:
Voice: The audio file is downloaded and transcribed into text using a voice-to-text service.
Photo: The image is downloaded, converted to a base64 string, and prepared for visual analysis.
Document: The file is routed to a document handler that extracts its text content for processing.
Text: The message is used as-is.
A Merge node gathers the processed input into a unified prompt.
The Manager Agent receives this prompt. It analyzes the user's intent and orchestrates one or more specialized agents/tools:
memory_base (Airtable): For saving and retrieving information from your long-term knowledge base.
todo_and_task_manager (Todoist): To create, assign, or check tasks.
email_agent (Gmail): To compose, search, or send emails.
calendar_agent (Google Calendar): To schedule events or check your agenda.
research_agent (Wikipedia/Web Search): To look up information.
project_management (Google Sheets): To provide updates on project trackers.
Follow these steps to get your AI assistant up and running.
30–60 minutes: If you already have your API keys, account credentials, and service IDs (like Sheet IDs) ready.
2–3 hours: For a complete, first-time setup, which includes creating API keys, setting up new spreadsheets or Airtable bases, and configuring detailed permissions.