Overview
This workflow implements a complete Retrieval-Augmented Generation (RAG) knowledge assistant with built-in document ingestion, conversational AI, and automated analytics using n8n, OpenAI, and Pinecone.
The system allows users to upload documents, automatically convert them into embeddings, query the knowledge base through a chat interface, and receive daily reports about chatbot performance and document usage.
Instead of manually searching through documentation, users can ask questions in natural language and receive answers grounded in the uploaded files. The workflow retrieves the most relevant document chunks from a vector database and provides them to the language model as context, ensuring accurate and source-based responses.
In addition to answering questions, the workflow records all chat interactions and generates daily usage analytics. These reports summarize chatbot activity, highlight the most referenced documents, and identify failed lookups where information could not be found.
This architecture is useful for teams building internal knowledge assistants, documentation chatbots, AI support tools, or searchable company knowledge bases powered by Retrieval-Augmented Generation.
How It Works
-
Document Upload Interface
- Users upload PDF, CSV, or JSON files through a form trigger.
- These documents become part of the knowledge base used by the chatbot.
-
Document Processing
- Uploaded files are loaded and converted into text.
- The text is split into smaller chunks to improve embedding quality and retrieval accuracy.
-
Embedding Generation
- Each text chunk is converted into vector embeddings using the OpenAI Embeddings node.
-
Vector Database Storage
- The embeddings are stored in a Pinecone vector database.
- This creates a searchable semantic index of the uploaded documents.
-
Chat Interface
- Users interact with the knowledge base through a chat interface.
- Each message becomes a query sent to the RAG system.
-
RAG Retrieval
- The workflow retrieves the most relevant document chunks from Pinecone.
- These chunks are provided to the language model as context.
-
AI Response Generation
- The chatbot generates an answer using only the retrieved document information.
- This ensures responses remain grounded in the knowledge base.
-
Chat Logging
- User questions, AI responses, timestamps, and referenced documents are logged.
- This enables monitoring and analytics of chatbot usage.
-
Daily Analytics Workflow
- A scheduled trigger runs every morning.
- The workflow retrieves chat logs from the previous 24 hours.
-
Report Generation
- Usage statistics are calculated, including:
- total questions asked
- failed document lookups
- most referenced documents
- overall success rate.
- Email Summary
- A formatted HTML report is generated and sent via email to provide a daily overview of chatbot activity and knowledge base performance.
Setup Instructions
-
Configure Pinecone
- Create a Pinecone index for storing document embeddings.
- Enter the index name in the Workflow Configuration node.
-
Add OpenAI Credentials
- Configure credentials for:
- OpenAI Chat Model
- OpenAI Embeddings node.
-
Configure Data Tables
- Create the following n8n Data Tables:
-
Set Workflow Parameters
- In the Workflow Configuration node configure:
- Pinecone namespace
- chunk size
- chunk overlap
- retrieval depth (top-K).
-
Configure Email Notifications
- Add Gmail credentials to send daily summary reports.
-
Deploy the Workflow
- Share the document upload form with users.
- Enable the chat interface for question answering.
Use Cases
Internal Knowledge Assistant
Allow employees to search internal documentation using natural language questions.
Customer Support Knowledge Base
Provide instant answers from support manuals, product documentation, or help center articles.
Documentation Search Engine
Turn large document collections into an AI-powered searchable knowledge system.
AI Helpdesk Assistant
Enable support teams to quickly retrieve answers from company knowledge repositories.
Knowledge Base Analytics
Monitor chatbot usage, identify missing documentation, and understand which files are most valuable to users.
Requirements
- n8n with LangChain nodes enabled
- OpenAI API credentials
- Pinecone account and index
- Gmail credentials for sending reports
- n8n Data Tables: