Quick overview
This is a starting point for building a Telegram AI agent. The base handles four input types: voice, pictures, video, and text, through the AI models of your choice. From here you connect tools to expand what the agent can do inside your n8n workflows.
How it works
- Input: a message sent to the bot chat.
- A Switch node sorts the message by type:
- Voice message
- Picture message
- Video message
- Text message
- It currently uses OpenAI and Gemini to analyze voice, photos, and video, but you can swap in other models. The model reads the message, generates a response from the system prompt, and sends it back as a Telegram message.
Setup
- Create the Telegram bot. In Telegram, search for "BotFather", send
/newbot, follow the prompts, and copy the access token.
- Add the Telegram credential in n8n. Open the Telegram trigger node, create a credential, paste the access token, and save.
- Add the LLM credentials. Add your OpenAI and Gemini keys (and any other model you prefer) to the LLM nodes, then pick your model, and make sure each account has credits. Guides: OpenAI (voice) → https://winflowai.com/blog/get-openai-api-key/ and Google Gemini (images and video) → https://winflowai.com/blog/get-gemini-api-key/
Requirements
- Telegram bot access token
- OpenAI API key (voice)
- Google Gemini API key (pictures and video)
- n8n instance (Cloud or self-hosted)
Customization
- Adjust the system prompt to shape the agent's output, and add tools to take it beyond conversation.