End-to-End Video Creation from user idea or transcript
AI-Powered Scriptwriting using LLMs (e.g., DeepSeek via OpenRouter)
Voiceover Generation with customizable TTS voices
Image Scene Generation using generative models like together.ai
Clip Creation & Concatenation into a full video
Dynamic Caption Generation with styling options
Google Drive & Sheets Integration for asset storage and progress tracking
User Submits Form with:
AI generates a script: A catchy title, description, hook, full script, and CTA using a language model.
Text-to-Speech (TTS): The script is turned into audio using the selected voice, with timestamped captions generated.
Scene Segmentation: The script is split into 5–6 second segments for visual storyboarding.
Image Prompt Creation: Each scene is converted into an image prompt in the selected style (e.g., "anime close-up of a racing car").
Image Generation: Prompts are sent to together.ai or fal.ai to generate scenes.
Clip Creation: Each image is turned into a short video clip (Ken Burns-style zoom) based on script timing.
Video Assembly:
Run your TTS server locally using Docker.
The nca-toolkit appears to be a custom video/image processing backend used via HTTP APIs:
http://host.docker.internal:9090/v1/image/transform/video
http://host.docker.internal:9090/v1/video/concatenate
http://host.docker.internal:9090/v1/ffmpeg/compose
Clone or build the nca-toolkit container (if it's a private tool):
Ensure it exposes port 9090.
It should support endpoints for:
Image to video (zoom effect)
Video concatenation
Audio + video merging
Caption overlay via FFmpeg
Run it locally with Docker:
docker run -d -p 9090:80 your-nca-toolkit-image
This handles image generation using models like FLUX.1-schnell.
🔧 Steps:
Create an account at: https://www.together.ai
Generate your API key