This workflow transforms your Telegram bot into J.A.R.V.I.S., a powerful, multimodal AI assistant. It can understand and process text, voice messages, images, and documents. The assistant can search the web, scrape websites, generate images, perform calculations, and reference uploaded documents to provide comprehensive and context-aware responses in either text or audio format.
This workflow is for developers, AI enthusiasts, and businesses who want to create an advanced, interactive AI assistant on Telegram. It’s perfect for automating customer support, creating a personal AI helper, or exploring the capabilities of multimodal large language models (LLMs) in a practical application.
The workflow begins when a message is received by your Telegram bot. A Switch node then directs the data based on the message type:
The processed input is then passed to the core "J.A.R.V.I.S." Agent node. This agent uses an OpenAI model, conversational memory, and a suite of tools (Google Search, Web Scraper, Image Generator, Calculator, and the document vector store) to formulate a response. Finally, the workflow checks if the initial message was a voice note; if so, it generates an audio response. Otherwise, it sends the answer as a text message back to the user.
Receive Message
Trigger node and all other Telegram nodes. In the Receive Message
node, enter the chatId
of the user or group authorized to interact with the bot.Basic Google Search
node to enable web search functionality.Webpage Scraper
node.OpenAI Chat Model
node (e.g., switch from gpt-4.1
to gpt-4o
) or in the Analyze Image
and Transcribe
nodes.J.A.R.V.I.S.
Agent node to change its name, tone, instructions, or default language.J.A.R.V.I.S.
Agent node to extend its capabilities, such as connecting to a database or another third-party API.If Audio Response
node to change the conditions for sending text or audio messages. For example, you could configure it to always respond with text.