Quick Overview
This workflow ingests PDF cost-engineering manuals from Google Drive into a Pinecone vector index using OpenAI embeddings, then answers user questions via an n8n chat webhook using a retrieval-augmented OpenAI agent that responds only with evidence from the indexed documents.
How it works
- Runs every 2 minutes on a schedule.
- Lists files in a configured Google Drive “incoming” folder and downloads each document.
- Extracts text from each PDF, splits it into overlapping chunks, and attaches document metadata.
- Generates OpenAI embeddings for the chunks and inserts them into the Pinecone
rag index.
- Moves each successfully processed Google Drive file into a configured “ingested/archive” folder.
- Receives user questions through an n8n Chat webhook and uses a LangChain agent with a Pinecone retrieval tool plus an OpenAI chat model to answer strictly from retrieved passages (or returns the defined fallback message when evidence is missing).
Setup
- Connect Google Drive credentials and replace
RAG_INCOMING_FOLDER_ID and RAG_INGESTED_FOLDER_ID with your actual folder IDs.
- Add an OpenAI API key for both embedding generation and chat responses, and confirm the chat model selection (e.g.,
gpt-4.1-mini).
- Connect Pinecone credentials, ensure an index named
rag exists, and match its embedding dimension to the OpenAI embeddings model you use.
- Upload your Technical Composition Manuals as PDFs to the Google Drive incoming folder.
- Enable the Chat trigger and copy its webhook URL into the client/app you use to send questions (or use n8n’s chat UI).