Generate short scripts, TTS audio, and SRT captions with OpenAI GPT-4o and Whisper

Created by

Last update

Last update 2 months ago

Quick overview

This workflow exposes a webhook that generates a punchline-first YouTube Shorts script with OpenAI, turns it into an MP3 voiceover using OpenAI Text-to-Speech, then transcribes that audio with Whisper to return word-timed SRT captions plus the audio as base64.

How it works

Receives a POST webhook request with a topic plus optional word count and OpenAI TTS voice.
Calls the OpenAI Chat Completions API (gpt-4o-mini) to generate a short CLAIM → PROOF → ESCALATION script.
Extracts the script text from the OpenAI response and validates it is not empty.
Sends the script to the OpenAI Text-to-Speech API (tts-1) to generate an MP3 narration.
Uploads the MP3 to the OpenAI Whisper transcription API (whisper-1) to get word-level timestamps.
Builds an SRT file by chunking timed words into short caption cues and responds to the webhook with {script, srt, audioBase64, durationSeconds}.

Setup

Create an OpenAI credential in n8n with your OpenAI API key and select it for the three OpenAI HTTP requests (Chat Completions, Text-to-Speech, and Transcriptions).
Activate the workflow and copy the webhook URL, then POST JSON containing at least a "topic" (optionally "words" and "voice") from your client or source app.
If you want to store the MP3 instead of returning it inline, add a file or cloud storage node after audio generation or after transcription and map the binary audio accordingly.