Build a cost engineering RAG with Google Drive, OpenAI, and Pinecone

Created by

Last update

Last update a month ago

Quick Overview

This workflow ingests PDF cost-engineering manuals from Google Drive into a Pinecone vector index using OpenAI embeddings, then answers user questions via an n8n chat webhook using a retrieval-augmented OpenAI agent that responds only with evidence from the indexed documents.

How it works

Runs every 2 minutes on a schedule.
Lists files in a configured Google Drive “incoming” folder and downloads each document.
Extracts text from each PDF, splits it into overlapping chunks, and attaches document metadata.
Generates OpenAI embeddings for the chunks and inserts them into the Pinecone rag index.
Moves each successfully processed Google Drive file into a configured “ingested/archive” folder.
Receives user questions through an n8n Chat webhook and uses a LangChain agent with a Pinecone retrieval tool plus an OpenAI chat model to answer strictly from retrieved passages (or returns the defined fallback message when evidence is missing).

Setup

Connect Google Drive credentials and replace RAG_INCOMING_FOLDER_ID and RAG_INGESTED_FOLDER_ID with your actual folder IDs.
Add an OpenAI API key for both embedding generation and chat responses, and confirm the chat model selection (e.g., gpt-4.1-mini).
Connect Pinecone credentials, ensure an index named rag exists, and match its embedding dimension to the OpenAI embeddings model you use.
Upload your Technical Composition Manuals as PDFs to the Google Drive incoming folder.
Enable the Chat trigger and copy its webhook URL into the client/app you use to send questions (or use n8n’s chat UI).