WhatsApp Audio Transcriber Bot
Overview
Automatically transcribe WhatsApp audio messages to text using AI-powered speech recognition. This workflow receives audio messages via webhook, processes them through Groq's Whisper API, and replies with the transcribed text in the same conversation.
Use Cases
- Accessibility: Help users with hearing impairments access audio content
- Workplace Communication: Quickly scan audio messages in professional settings
- Language Learning: Get text versions of audio for better comprehension
- Meeting Notes: Convert voice messages to searchable text format
- Multilingual Support: Transcribe audio in Portuguese (configurable for other languages)
How it Works
- Message Reception: Webhook receives WhatsApp messages in real-time
- Audio Detection: Filters only audio messages using Switch node
- Format Conversion: Converts base64 audio to MP3 file format
- AI Transcription: Processes audio through Groq API with Whisper Large V3 model
- Response Delivery: Sends transcribed text back to the original conversation
Key Features
- ✅ Real-time Processing: Instant transcription of incoming audio messages
- ✅ High Accuracy: Uses Whisper Large V3 model for reliable transcription
- ✅ Auto-Reply: Automatically responds in the same WhatsApp conversation
- ✅ Message Quoting: References the original audio message in the reply
- ✅ Portuguese Optimized: Configured for Brazilian Portuguese transcription
- ✅ Self-Message Filtering: Ignores messages sent by the bot itself
Prerequisites
Required Services
- Evolution API: WhatsApp integration service
- Groq API: AI transcription service (Whisper model)
- n8n Instance: Workflow automation platform
API Keys & Configuration
- Groq API key (set as environment variable:
GROQ_API_KEY
)
- Evolution API instance properly configured
- Webhook URL configured in Evolution API
Setup Instructions
- Import Workflow: Import the JSON workflow into your n8n instance
- Configure Environment: Set
GROQ_API_KEY
environment variable
- Setup Webhook: Configure Evolution API to send messages to the webhook endpoint
- Test Connection: Send a test audio message to verify the workflow
Workflow Nodes
- Webhook: Receives WhatsApp messages from Evolution API
- Edit Fields: Extracts relevant data (number, name, message, audio)
- Switch: Filters only audio messages (
audioMessage
type)
- Convert to File: Transforms base64 audio to MP3 format
- HTTP Request: Sends audio to Groq API for transcription
- Evolution API: Sends transcribed text back to WhatsApp
Configuration Options
Groq API Settings
- Model:
whisper-large-v3
- Language:
pt
(Portuguese)
- Temperature:
0
(maximum accuracy)
- Response Format:
json
Customization Options
- Change language by modifying the
language
parameter
- Adjust temperature for different accuracy/creativity balance
- Modify response format for different output styles
Response Format
*Mensagem transcrita automaticamente.*
[Transcribed text content]
Technical Specifications
- Input: Base64 encoded audio from WhatsApp
- Output: Plain text transcription
- Processing Time: Typically 2-5 seconds per audio message
- Supported Audio: MP3 format (converted from WhatsApp audio)
- Language: Portuguese (configurable)
Troubleshooting
- No Response: Check Groq API key and webhook configuration
- Poor Transcription: Ensure audio quality and check language settings
- Error Messages: Monitor n8n execution logs for detailed error information
Version History
- v0.0.1: Initial release with basic transcription functionality