Build a Self-Updating RAG System with OpenAI, Google Gemini, Qdrant and Google Drive

Created by

Davide

Last update

Last update 3 months ago

Key Advantages

Automated Knowledge Base Updates
No manual intervention is required—documents in Google Drive are automatically synchronized with Qdrant.
Efficient Search and Retrieval
Vector embeddings enable fast and precise retrieval of relevant information.
Scalable and Flexible
Works with multiple documents and supports continuous growth of your dataset.
Seamless AI Integration
Combines OpenAI embeddings for vectorization and Google Gemini for high-quality natural language answers.
Metadata-Enhanced Storage
Each document stores metadata (file ID and name), making it easy to manage and track document versions.
End-to-End RAG Pipeline
From document ingestion to AI-powered Q&A, everything is handled inside one n8n workflow.

How It Works

This workflow implements a Retrieval-Augmented Generation (RAG) system that automatically processes, stores, and retrieves document information for AI-powered question answering. Here’s how it functions:

Document Processing & Vectorization:
- The system monitors a specified Google Drive folder for new or updated files.
- When a file is added or modified, it is downloaded and split into manageable chunks using a Recursive Character Text Splitter.
- Each chunk is converted into vector embeddings using OpenAI's embedding model.
- These vectors, along with metadata (file ID, file name), are stored in a Qdrant vector database.
Automatic Updates:
- The workflow includes a mechanism to delete old vectors associated with an updated file before inserting the new ones, ensuring the knowledge base remains current.
Query Handling & Response Generation:
- When a user sends a chat message (via a chat trigger), the system:
  - Retrieves the most relevant document chunks from Qdrant based on the query's semantic similarity.
  - Uses a Google Gemini language model to generate a context-aware answer grounded in the retrieved documents.
- This provides accurate, source-based responses instead of relying solely on the AI's internal knowledge.
Initial Setup & Maintenance:
- The workflow can be triggered manually to create the Qdrant collection or clear all existing data.
- It processes all existing files in the Drive folder during initial setup, populating the vector store.

Set Up Steps

To configure this workflow, follow these steps:

STEP 1: Create Qdrant Collection

Replace QDRANTURL in the "Create collection" and "Clear collection" nodes with your Qdrant instance URL (e.g., http://your-qdrant-host:6333).
Replace COLLECTION with your desired collection name.
Ensure the Qdrant API credentials are correctly set in the respective HTTP Request nodes.

STEP 2: Configure Google Drive Access

Set up OAuth credentials for Google Drive to allow the workflow to:
- Read files from a specific folder .
- Download files for processing.
Update the Folder ID in the "Search files" and "Update?" trigger nodes to point to your target Google Drive folder.

STEP 3: Set Up AI Models

Configure the OpenAI API credentials in the Embeddings nodes for generating text embeddings.
Configure the Google Gemini (PaLM) API credentials in the Google Gemini Chat Model node for generating answers.

STEP 4: Configure Metadata

The system automatically attaches metadata (file_id, file_name) to each document chunk. This is set in the Default Data Loader nodes.
This metadata is crucial for identifying the source of information and for the update mechanism.

STEP 5: Test the RAG System

The workflow includes a chat trigger ("When chat message received") for testing.
Send a query to test the retrieval and answer generation process.

Need help customizing?

Contact me for consulting and support or add me on Linkedin.

Build a Self-Updating RAG System with OpenAI, Google Gemini, Qdrant and Google Drive

Key Advantages

How It Works

Set Up Steps

Need help customizing?

There’s nothing you can’t automate with n8n