Build a Multi-functional Telegram Bot with Gemini, RAG PDF Search & Google Suite

Created by

Issam AGGOUR

Last update

Last update 2 months ago

The Ultimate Beginner's Guide to an AI-Powered Telegram Assistant (PDF, Brave search & Google Suite)

This comprehensive workflow bundle is designed as a powerful starter kit, enabling you to build a multi-functional AI assistant on Telegram. It seamlessly integrates AI-powered voice interactions, an intelligent PDF document search using a Retrieval-Augmented Generation (RAG) system, and automates various Google Suite tasks like calendar management and document generation. Perfect for beginners looking to explore advanced AI and automation capabilities.

Disclaimer

This template is designed for self-hosted n8n instances.

🚀 Key Features

Telegram Bot Interface: Interact with your AI assistant using both text and voice commands through Telegram.
AI Voice Bot:
- Transcribes user voice messages using OpenAI Whisper.
- Processes requests with an AI agent powered by Google Gemini.
- Responds with AI-synthesized voice using Replicate API.
PDF RAG System:
- Index PDF documents from Google Drive via Telegram commands.
- Utilizes Mistral AI for Optical Character Recognition (OCR) on PDFs.
- Stores document content and embeddings in a Qdrant vector database.
- Answers questions about your documents using Google Gemini, based on retrieved context.
Google Suite Automation:
- Manage Google Calendar: Create events, find upcoming holidays, and list birthdays.
- Google Drive: Search for PDF files and manage document templates.
- Google Docs: Automatically generate invoices from templates.
Intelligent Web Search: Employs Brave Search for fetching real-time information from the web.
Versatile AI Agent: Leverages Google Gemini with a suite of tools including a calculator, date & time utilities, and custom integrations (e.g., example Airbnb tools).
Command-Driven Functionality: Easily trigger specific actions using Telegram commands like /help, /pdf, /rag, /invoice, /chat, /brave, and /birthday.

⚙️ How It Works

The workflow is initiated by messages or commands sent to your Telegram bot. A central Switch node directs the flow based on the input received.

1. Telegram Interaction & Command Routing

A Telegram Trigger node listens for new messages (text or voice).
A "typing..." indicator is sent to Telegram for better user experience.
The Switch node parses the message content and routes it to the appropriate sub-workflow based on predefined commands (e.g., /pdf, /rag, /voice) or general chat.

2. AI Voice Bot Functionality

For voice messages (or if the voice path is triggered):
- The voice file is downloaded from Telegram.
- OpenAI Whisper transcribes the audio into text.
- The transcribed text is fed to an AI Agent (powered by Google Gemini and equipped with various tools and memory).
- The AI Agent's text response is then sent to the Replicate API to generate a natural-sounding voice.
- The generated audio response is sent back to the user on Telegram.

3. PDF RAG System with Mistral OCR & Qdrant

Indexing PDFs (via /qdrant <Google_Drive_File_ID> command):
- The specified PDF is downloaded from Google Drive.
- The PDF is uploaded to Mistral AI for OCR processing.
- The extracted text (in markdown format) is retrieved.
- The text is split into manageable chunks.
- OpenAI Embeddings are generated for each chunk.
- These chunks and their embeddings are stored in a Qdrant vector collection.
- A confirmation message is sent to Telegram.
Querying PDFs (via /rag <your_question> command):
- The user's question is processed by a RetrievalQA Chain.
- This chain uses Google Gemini as the Language Model and retrieves relevant document chunks from Qdrant based on semantic similarity (embeddings).
- Google Gemini then generates a concise answer based on the user's question and the retrieved contextual information from the documents.
- The answer is sent back to the user on Telegram.
Searching PDFs in Drive (via /pdf <search_term> command):
- Searches your Google Drive for PDF files matching the provided search term.
- Sends a list of found PDF files (name and ID) to the user on Telegram, allowing them to easily identify files for indexing with /qdrant.

4. AI Chat & Task Automation (General Chat & `/chat` command)

Handles general text messages or transcribed voice inputs that are not specific commands.
An AI Agent (Google Gemini) processes these inputs.
The agent is equipped with tools such as:
- Google Calendar Tools: To create events, find the next public holiday, or list upcoming birthdays.
- Brave Search Tool: To search the internet for information.
- Calculator Tool: For mathematical computations.
- Date & Time Utility: For handling and formatting dates.
- Example Airbnb Tools: Demonstrates extensibility for custom tool integrations.
The agent utilizes a "Think Node" process for reasoning and Window Buffer Memory to maintain conversational context.
Responses are formulated and sent back to the user on Telegram.

5. Google Automations

Invoice Generation (via /invoice command):
- Copies a predefined Google Docs invoice template from your Google Drive.
- Populates the new document with details (client name, items, etc. – currently uses placeholder data from an 'Edit Fields' node, which can be customized to parse input from the command).
- Converts the populated Google Doc into a PDF file.
- Sends the generated PDF invoice to the user via Telegram.
Calendar Management (via AI Agent or specific commands like /birthday):
- The /birthday command fetches upcoming birthdays from a specified Google Calendar.
- The AI Agent can also interact with Google Calendar to create events or retrieve information based on natural language requests.

6. Web Search (via `/brave <query>` command or AI Agent)

Performs a web search using the Brave Search API.
Returns a list of search results, including titles and URLs, to the user on Telegram.

7. Help Command (`/help`)

Provides the user with a formatted list of all available Telegram commands and their basic usage instructions.

🛠️ Setup Steps & Credentials

To get this workflow bundle up and running, you'll need to configure several credentials:

Telegram:
- Create a new bot via @BotFather on Telegram to obtain a Bot Token.
- Add this token to the Telegram Trigger node and all Telegram (sender) nodes in the workflow.
OpenAI:
- Obtain an API Key from platform.openai.com.
- Create an OpenAI credential.
- Use this credential in the Convert audio to text (OpenAI node for Whisper transcription) and all Embeddings OpenAI nodes (for RAG system).
Replicate API (Text-to-Speech):
- Sign up at replicate.com and get your API token.
- Create an HTTP Bearer Auth credential using this token.
- Assign this credential to the Call Replicate API (HTTP Request) node.
Google Gemini (PaLM API):
- Get an API key from Google AI Studio or Google Cloud Console.
- Create a Google Gemini(PaLM) Api credential.
- Assign it to all Google Gemini Chat Model nodes.
Qdrant (Vector Database):
- Set up a Qdrant instance (either cloud-hosted or self-managed).
- Note your Qdrant instance URL and API Key (if security is enabled).
- Create a QdrantApi credential with these details.
- Assign it to the Qdrant Vector Store nodes.
- For the Refresh collection node (an HTTP Request node used to clear the collection for /pdf command demo), create an HTTP Header Auth credential with your Qdrant API key if required by your instance.
Mistral AI (PDF OCR):
- Obtain an API key from console.mistral.ai.
- Create a Mistral Cloud API credential.
- Assign this to the Mistral Upload, Mistral Signed URL, and Mistral DOC OCR (HTTP Request) nodes.
Google Drive & Google Docs:
- Ensure the Google Drive API and Google Docs API are enabled in your Google Cloud Console project.
- Set up OAuth 2.0 credentials (Client ID and Client Secret).
- Create Google Drive OAuth2 Api and Google Docs OAuth2 Api credentials.
- Assign these to the respective Google Drive and Google Docs nodes throughout the workflow.
- Important: Configure relevant Folder IDs (for PDF search, invoice template source, invoice output) and the invoice template Document ID in the Google Drive and Google Docs nodes.
Google Calendar:
- Enable the Google Calendar API in your Google Cloud Console project.
- Set up OAuth 2.0 credentials.
- Create a Google Calendar OAuth2 Api credential.
- Assign it to the Google CalendarTool and Google Calendar nodes.
- Specify the correct calendar names or IDs in the nodes (e.g., for birthdays, holidays, new event creation).
Brave Search:
- Get a Brave Search API key from their developer portal.
- Create a Brave Search API credential.
- Assign it to the Brave Search nodes and tools.
(Optional) Airbnb MCP Client:
- The workflow includes example nodes for Airbnb MCP tools. If you intend to use or expand these, set up the corresponding MCP Client API credentials.

💡 Customization & Learning

This "Beginner Bundle" is not just a ready-to-use solution but also a fantastic learning resource:

Explore AI Agent Prompts: Dive into the AI Agent nodes to see how prompts are structured to guide the LLM's behavior, including the "Think Node" process and character guidance for robust messaging.
Modify Toolsets: Easily add or remove tools for the AI agent to expand its capabilities.
Invoice Customization: Adapt the Edit Fields node and the Google Docs template to match your invoicing needs. You can extend it to parse invoice details directly from the Telegram command.
RAG Parameters: Experiment with chunk sizes, overlap, and different embedding models in the RAG pipeline.
Extend Commands: Add new commands and corresponding functionalities by expanding the main Switch node and building out new automation paths.

By setting up and dissecting this workflow, beginners can gain a practical understanding of building complex, AI-driven applications, integrating various services, and managing different data flows.