Complete AI support system using website data (RAG pipeline)
This template provides a full end-to-end Retrieval-Augmented Generation (RAG) system using n8n. It includes two connected workflows:
- A data ingestion pipeline that crawls a website and stores its content in a vector database.
- A customer support chatbot that retrieves this knowledge and answers user queries in real time.
Together, these workflows allow you to turn any public website into an intelligent AI-powered support assistant grounded in real business data.
Use cases
- AI customer support chatbot for your website
- Internal company knowledge assistant
- Product FAQ automation
- Helpdesk or IT support bot
- AI receptionist for services
- Semantic search over company content
How it works
Ingestion workflow
- Discover all URLs from a website sitemap.
- Filter and normalize the URLs.
- Fetch each page and extract readable text.
- Clean HTML into plain text.
- Split text into overlapping chunks.
- Generate embeddings using OpenAI.
- Store vectors in Pinecone with metadata.
Chatbot workflow
- A user sends a message via chat webhook.
- The agent queries Pinecone for relevant knowledge.
- Retrieved content is passed to OpenAI.
- OpenAI generates a grounded response.
- Short-term memory maintains conversation context.
How to use
Step 1 – Run ingestion
- Set your target website URL.
- Add Firecrawl, OpenAI, and Pinecone credentials.
- Create a Pinecone index.
- Execute the ingestion workflow.
- Wait until all pages are indexed.
Step 2 – Run chatbot
- Deploy the chatbot workflow.
- Set the same Pinecone index and namespace.
- Copy the chat webhook URL.
- Connect it to a website, chat widget, or WhatsApp bot.
- Start chatting with your AI assistant.
Requirements
- Firecrawl account
- OpenAI API key
- Pinecone account and index
- Public website to crawl
- Optional: frontend chat interface
Good to know
- The chatbot never answers from memory for business data.
- All company knowledge comes from Pinecone.
- If Pinecone returns nothing, the bot fails safely.
- HTML cleaning is basic and can be replaced with:
- Mozilla Readability
- Jina Reader
- Unstructured
- Chunk size and overlap affect retrieval quality.
- Pinecone can be replaced with:
- Qdrant
- Weaviate
- Supabase Vector
- Chroma
Customising this workflow
You can extend this system by:
- Adding PDF or document loaders
- Scheduling ingestion daily or weekly
- Connecting CRM or ticketing systems
- Adding appointment booking tools
- Switching to local or open-source models
- Adding multilingual support
- Storing raw content in a database
- Adding feedback or logging
What this n8n template demonstrates
- Real-world RAG architecture
- Web crawling pipelines
- Text chunking strategies
- Vector database integration
- AI agent orchestration
- Memory-controlled conversations
- Production-grade AI support systems
- End-to-end AI infrastructure with n8n
Architecture overview
This template follows a modern AI system design:
Website → Ingestion → Embeddings → Pinecone → Retrieval → OpenAI → User
It separates:
- Data preparation (offline)
- Knowledge storage
- Runtime inference
This makes the system scalable, maintainable, and safe for production use.
Need a custom setup?
If you want a similar AI system built for your business (custom data sources, CRM integration, WhatsApp bots, booking systems, dashboards, or private deployments), feel free to reach out at [email protected].
I help companies design and deploy production-ready AI workflows.