Back to Templates

Multimodal Telegram Bot with Voice, Image & Video Analysis using Claude & Gemini

Created by

Created by: Keith Uy || keithuy

Keith Uy

Last update

Last update 6 days ago

Share


What it's for:

This is a base template for anyone trying to develop a telegram AI Agent. This base allows for multiple inputs (Voice, Picture, Video, and Text inputs) to be processed by an AI model of their choosing to a get a User started. From here, the User may connect any tools that they see fit to the AI Agent for their n8n workflows.

How it works:

Input: Telegram message to a bot chat

n8n Processing: Switch node determines the type:

  1. Voice Message
  2. Picture Message
  3. Video Message
  4. Text Message

(Currently uses OpenAI and Gemini to analyze Voice/Photo/Video content but feel free to change these nodes with other models)

AI Agent Proccessing: LLM of your choosing examines message and based on system prompt, generates an output

Output: AI Output is sent back in telegram Message

How to use:

  1. Create your chat bot and generate access token
    -> Search Bot father in telegram
    -> Type "/newbot"
    -> follow instructions and create access token
    -> Copy access token

  2. Create Credentials in n8n
    -> Open telegram trigger node
    -> Click create credential
    -> Paste access token
    -> Save

  3. Create LLM access token
    (Different per LLM but search your LLM + API in google)
    -> (will have to create an account with the LLM platform)
    -> buy credits to use LLM API
    -> Generate Access token
    -> Paste token in LLM node

Requirements:

  • Telegram Bot Access Token
  • Google Gemini Access Token (For Picture and Video messages)
  • OpenAI Access Token (For Voice messages)
  • LLM Access Token (Your preference for the AI Agent)

Customizing this workflow:

  • To personalize the AI Output, adjust the system prompt (give context or directions on the AI's role)
  • Add tools to the AI agent to give it more utility besides a personalied LLM (Example: Calendars, Databases, etc).