Create voice assistant interface with OpenAI GPT-4o-mini and text-to-speech

Created by

Anderson Adelino

Last update

Last update 5 months ago

Voice Assistant Interface with n8n and OpenAI

This workflow creates a voice-activated AI assistant interface that runs directly in your browser. Users can click on a glowing orb to speak with the AI, which responds with voice using OpenAI's text-to-speech capabilities.

Who is it for?

This template is perfect for:

Developers looking to add voice interfaces to their applications
Customer service teams wanting to create voice-enabled support systems
Content creators building interactive voice experiences
Anyone interested in creating their own "Alexa-like" assistant

How it works

The workflow consists of two main parts:

Frontend Interface: A beautiful animated orb that users click to activate voice recording
Backend Processing: Receives the audio transcription, processes it through an AI agent with memory, and returns voice responses

The system uses:

Web Speech API for voice recognition (browser-based)
OpenAI GPT-4o-mini for intelligent responses
OpenAI Text-to-Speech for voice synthesis
Session memory to maintain conversation context

Setup requirements

n8n instance (self-hosted or cloud)
OpenAI API key with access to:
- GPT-4o-mini model
- Text-to-Speech API
Modern web browser with Web Speech API support (Chrome, Edge, Safari)

How to set up

Import the workflow into your n8n instance
Add your OpenAI credentials to both OpenAI nodes
Copy the webhook URL from the "Audio Processing Endpoint" node
Edit the "Voice Assistant UI" node and replace YOUR_WEBHOOK_URL_HERE with your webhook URL
Access the "Voice Interface Endpoint" webhook URL in your browser
Click the orb and start talking!

How to customize the workflow

Change the AI personality: Edit the system message in the "Process User Query" node
Modify the visual style: Customize the CSS in the "Voice Assistant UI" node
Add more capabilities: Connect additional tools to the AI Agent
Change the voice: Select a different voice in the "Generate Voice Response" node
Adjust memory: Modify the context window length in the "Conversation Memory" node

Demo

Watch the template in action: https://youtu.be/0bMdJcRMnZY