Multimodal telegram bot with voice, image & video analysis using Claude & Gemini

Created by

Keith Uy

Last update

Last update 2 days ago

What it's for:

This is a base template for anyone trying to develop a telegram AI Agent. This base allows for multiple inputs (Voice, Picture, Video, and Text inputs) to be processed by an AI model of their choosing to a get a User started. From here, the User may connect any tools that they see fit to the AI Agent for their n8n workflows.

How it works:

Input: Telegram message to a bot chat

n8n Processing: Switch node determines the type:

Voice Message
Picture Message
Video Message
Text Message

(Currently uses OpenAI and Gemini to analyze Voice/Photo/Video content but feel free to change these nodes with other models)

AI Agent Proccessing: LLM of your choosing examines message and based on system prompt, generates an output

Output: AI Output is sent back in telegram Message

How to use:

Create your chat bot and generate access token
-> Search Bot father in telegram
-> Type "/newbot"
-> follow instructions and create access token
-> Copy access token
Create Credentials in n8n
-> Open telegram trigger node
-> Click create credential
-> Paste access token
-> Save
Create LLM access token
(Different per LLM but search your LLM + API in google)
-> (will have to create an account with the LLM platform)
-> buy credits to use LLM API
-> Generate Access token
-> Paste token in LLM node

Requirements:

Telegram Bot Access Token
Google Gemini Access Token (For Picture and Video messages)
OpenAI Access Token (For Voice messages)
LLM Access Token (Your preference for the AI Agent)

Customizing this workflow:

To personalize the AI Output, adjust the system prompt (give context or directions on the AI's role)
Add tools to the AI agent to give it more utility besides a personalied LLM (Example: Calendars, Databases, etc).