Transcribe WhatsApp Audio Messages with Whisper AI via Groq

Created by

Noriwal AlMa Jr

Last update

Last update 24 days ago

Overview

Automatically transcribe WhatsApp audio messages to text using AI-powered speech recognition. This workflow receives audio messages via webhook, processes them through Groq's Whisper API, and replies with the transcribed text in the same conversation.

Use Cases

Accessibility: Help users with hearing impairments access audio content
Workplace Communication: Quickly scan audio messages in professional settings
Language Learning: Get text versions of audio for better comprehension
Meeting Notes: Convert voice messages to searchable text format
Multilingual Support: Transcribe audio in Portuguese (configurable for other languages)

How it Works

Message Reception: Webhook receives WhatsApp messages in real-time
Audio Detection: Filters only audio messages using Switch node
Format Conversion: Converts base64 audio to MP3 file format
AI Transcription: Processes audio through Groq API with Whisper Large V3 model
Response Delivery: Sends transcribed text back to the original conversation

Key Features

✅ Real-time Processing: Instant transcription of incoming audio messages
✅ High Accuracy: Uses Whisper Large V3 model for reliable transcription
✅ Auto-Reply: Automatically responds in the same WhatsApp conversation
✅ Message Quoting: References the original audio message in the reply
✅ Portuguese Optimized: Configured for Brazilian Portuguese transcription
✅ Self-Message Filtering: Ignores messages sent by the bot itself

Prerequisites

Required Services

Evolution API: WhatsApp integration service
Groq API: AI transcription service (Whisper model)
n8n Instance: Workflow automation platform

API Keys & Configuration

Groq API key (set as environment variable: GROQ_API_KEY)
Evolution API instance properly configured
Webhook URL configured in Evolution API

Setup Instructions

Import Workflow: Import the JSON workflow into your n8n instance
Configure Environment: Set GROQ_API_KEY environment variable
Setup Webhook: Configure Evolution API to send messages to the webhook endpoint
Test Connection: Send a test audio message to verify the workflow

Workflow Nodes

Webhook: Receives WhatsApp messages from Evolution API
Edit Fields: Extracts relevant data (number, name, message, audio)
Switch: Filters only audio messages (audioMessage type)
Convert to File: Transforms base64 audio to MP3 format
HTTP Request: Sends audio to Groq API for transcription
Evolution API: Sends transcribed text back to WhatsApp

Configuration Options

Groq API Settings

Model: whisper-large-v3
Language: pt (Portuguese)
Temperature: 0 (maximum accuracy)
Response Format: json

Customization Options

Change language by modifying the language parameter
Adjust temperature for different accuracy/creativity balance
Modify response format for different output styles

Response Format

*Mensagem transcrita automaticamente.*
[Transcribed text content]

Technical Specifications

Input: Base64 encoded audio from WhatsApp
Output: Plain text transcription
Processing Time: Typically 2-5 seconds per audio message
Supported Audio: MP3 format (converted from WhatsApp audio)
Language: Portuguese (configurable)

Troubleshooting

No Response: Check Groq API key and webhook configuration
Poor Transcription: Ensure audio quality and check language settings
Error Messages: Monitor n8n execution logs for detailed error information

Version History

v0.0.1: Initial release with basic transcription functionality