How It Works
This workflow transforms any webpage into an AI-narrated audio summary delivered via WhatsApp:
- Receive URL - WhatsApp Trigger captures incoming messages and passes them to URL extraction
- Extract & validate - Code node extracts URLs using regex and validates format; IF node checks for errors
- User feedback - Sends either error message ("Please send valid URL") or processing status ("Fetching and analyzing... 10-30 seconds")
- Fetch webpage - Sub-workflow calls Jina AI Reader (https://r.jina.ai/) to fetch JavaScript-rendered content, bypassing bot blocks
- Summarize content - GPT-4o-mini processes webpage text in 6000-character chunks, extracts title and generates concise summary
- Generate audio - OpenAI TTS-1 converts summary text to natural-sounding audio (Opus format for WhatsApp compatibility)
- Deliver result - WhatsApp node sends audio message back to user with summary
Why Jina AI? Many modern websites (like digibyte.io) require JavaScript to load content. Standard HTTP requests only fetch the initial HTML skeleton ("JavaScript must be enabled"). Jina AI executes JavaScript and returns clean, readable text.
Setup Steps
Time estimate: ~20-25 minutes
1. WhatsApp Business API Setup (10-15 minutes)
- Create Meta Developer App - Go to https://developers.facebook.com/, create Business app, add WhatsApp product
- Get Phone Number ID - Use Meta's test number or register your own business phone
- Generate System User Token - Create at https://business.facebook.com/latest/settings/system_users (permanent token, no 4-hour expiry)
- Configure Webhook - Point to your n8n instance URL, subscribe to "messages" events
- Verify business - Meta requires 3-5 verification steps (business, app, phone, system user)
2. Configure n8n Credentials (5 minutes)
- OpenAI - Add API key in Credentials → OpenAI (used in 2 places: "Convert Summary to Audio" and "OpenAI Chat Model" in sub-workflow)
- WhatsApp OAuth - Add in WhatsApp Trigger node using System User token from step 1
- WhatsApp API - Add in all WhatsApp action nodes (Send Error, Send Processing, Send Audio) using same token
3. Link Sub-Workflow (3 minutes)
- Ensure "[SUB] Get Webpage Summary" workflow is activated
- In "Get Webpage Summary" node, select the sub-workflow from dropdown
- Verify workflow ID matches: QglZjvjdZ16BisPN
4. Update Phone Number IDs (2 minutes)
- Copy your Phone Number ID from Meta console
- Update in all WhatsApp nodes: Send Error Message, Send Processing Message, Send Audio Summary
5. Test the Flow (2 minutes)
- Activate both workflows (sub-workflow first, then main)
- Send test message to WhatsApp: https://example.com
- Verify: Processing message arrives → Audio summary delivered within 30 seconds
Important Notes
WhatsApp Caveats:
- 24-hour window - Can't auto-message users after 24 hours unless they message first (send "Hi" each morning to reset)
- Verification fatigue - Meta requires multiple business verifications; budget 30-60 minutes if first time
- Test vs Production - Test numbers work for single users; production requires business verification
Audio Format:
- Using Opus codec (optimal for WhatsApp, smaller file size than MP3)
- Speed set to 1.0 (normal pace) - adjust in "Convert Summary to Audio" node if needed
- Cost: ~$0.015 per minute of audio generated
Jina AI Integration:
- Free tier works for basic use (no API key required)
- Handles JavaScript-heavy sites automatically
- Add Authorization: Bearer YOUR_KEY header for higher limits
- Alternative: Replace with Playwright/Puppeteer for self-hosted rendering
Cost Breakdown (per summary):
- GPT-4o-mini summarization: ~$0.005-0.015
- OpenAI TTS audio: ~$0.005-0.015
- WhatsApp messages: Free (up to 1,000/month)
- Total: ~$0.01-0.03 per summary
Troubleshooting:
- "Cannot read properties of undefined" → Status update, not message (code node returns null correctly)
- "JavaScript must be enabled" → Website needs Jina AI (already implemented in Fetch site texts node)
- Audio not sending → Check binary data field is named data in TTS node
- No webhook received → Verify n8n URL is publicly accessible and webhook subscription includes "messages"
Pro Tips:
- Change voice in TTS node: alloy (neutral), echo (male), nova (female), shimmer (soft)
- Adjust summary length: Modify chunkSize: 6000 in sub-workflow's Text Splitter node (lower = faster but less detailed)
- Add rate limiting: Insert Code node after trigger to track user requests per hour
- Store summaries: Add database node after "Clean up" to archive for later retrieval
The Use Cases:
- Executive commuting - Consume industry news hands-free
- Research students - Cover 3x more sources while multitasking
- Visually impaired - Access any webpage via natural audio
- Sales teams - Research prospects on the go
- Content creators - Monitor competitors while exercising