This workflow is a highly advanced multimodal AI assistant designed to operate through WhatsApp. It can understand and respond to text, images, voice messages, and PDF documents by combining OpenAI models with smart logic to adapt to the content received.
šÆ Core Features
š„ 1. Automatic Message Type Detection
Using the Input type
node, the bot detects whether the user has sent:
- Text
- Voice messages
- Images
- Files (PDF)
- Other unsupported content
š¬ 2. Smart Text Message Handling
- Text messages are processed by an OpenAI GPT-4o-mini agent with a customized system prompt.
- Replies are concise, accurate, and formatted for mobile readability.
š¼ļø 3. Image Analysis & Description
- Images are downloaded, converted to base64, and analyzed by an image-aware AI model.
- The output is a rich, structured description, designed for visually impaired users or visual content interpretation.
šļø 4. Voice Message Transcription & Reply
- Audio messages are downloaded and transcribed using OpenAI Whisper.
- The transcribed text is analyzed and answered by the AI.
- Optionally, the AI reply can be converted back to voice using OpenAI's text-to-speech, and sent as an audio message.
š 5. PDF Document Extraction & Summary
- Only PDFs are allowed (filtered via MIME type).
- The documentās content is extracted and combined with the user's message.
- The AI then provides a relevant summary or answer.
š§ 6. Contextual Memory
- Each user has a personalized session ID with a memory window of 10 interactions.
- This ensures a more natural and contextual conversation flow.
How It Works
Thisworkflow is designed to handle incoming WhatsApp messages and process different types of inputs (text, audio, images, and PDF documents) using AI-powered analysis. Hereās how it functions:
- Trigger: The workflow starts with the WhatsApp Trigger node, which listens for incoming messages (text, audio, images, or documents).
- Input Routing: The Input type (Switch node) checks the message type and routes it to the appropriate processing branch:
- Text: Directly forwards the message to the AI agent for response generation.
- Audio: Downloads the audio file, transcribes it using OpenAI, and sends the transcription to the AI agent.
- Image: Downloads the image, analyzes it with OpenAIās GPT-4 model, and generates a detailed description.
- PDF Document: Downloads the file, extracts text, and processes it with the AI agent.
- Unsupported Formats: Sends an error message if the input is not supported.
- AI Processing: The AI Agent1 node, powered by OpenAI, processes the input (text, transcribed audio, image description, or PDF content) and generates a response.
- Response Handling:
- For audio inputs, the AIās response is converted back into speech (using OpenAIās TTS) and sent as a voice message.
- For other inputs, the response is sent as a text message via WhatsApp.
- Memory: The Simple Memory node maintains conversation context for follow-up interactions.
Setup Steps
To deploy this workflow in n8n, follow these steps:
- Configure WhatsApp API Credentials:
- Set up WhatsApp Business API credentials (Meta Developer Account).
- Add the credentials in the WhatsApp Trigger, Get Image/Audio/File URL, and Send Message nodes.
Set Up OpenAI Integration:
- Provide an OpenAI API key in the Analyze Image, Transcribe Audio, Generate Audio Response, and AI Agent1 nodes.
Adjust Input Handling (Optional):
- Modify the Switch node ("Input type") to handle additional message types if needed.
- Update the "Only PDF File" IF node to support other document formats.
Test & Deploy:
- Activate the workflow and test with different message types (text, audio, image, PDF).
- Ensure responses are correctly generated and sent back via WhatsApp.
Need help customizing?
Contact me for consulting and support or add me on Linkedin.