Quick overview
This workflow exposes a webhook that generates a punchline-first YouTube Shorts script with OpenAI, turns it into an MP3 voiceover using OpenAI Text-to-Speech, then transcribes that audio with Whisper to return word-timed SRT captions plus the audio as base64.
How it works
- Receives a POST webhook request with a topic plus optional word count and OpenAI TTS voice.
- Calls the OpenAI Chat Completions API (gpt-4o-mini) to generate a short CLAIM → PROOF → ESCALATION script.
- Extracts the script text from the OpenAI response and validates it is not empty.
- Sends the script to the OpenAI Text-to-Speech API (tts-1) to generate an MP3 narration.
- Uploads the MP3 to the OpenAI Whisper transcription API (whisper-1) to get word-level timestamps.
- Builds an SRT file by chunking timed words into short caption cues and responds to the webhook with {script, srt, audioBase64, durationSeconds}.
Setup
- Create an OpenAI credential in n8n with your OpenAI API key and select it for the three OpenAI HTTP requests (Chat Completions, Text-to-Speech, and Transcriptions).
- Activate the workflow and copy the webhook URL, then POST JSON containing at least a "topic" (optionally "words" and "voice") from your client or source app.
- If you want to store the MP3 instead of returning it inline, add a file or cloud storage node after audio generation or after transcription and map the binary audio accordingly.