Document Q&A with RAG: Query PDF Content using Weaviate and OpenAI

Created by

Mary Newhauser

Last update

Last update 4 months ago

RAG over a PDF with Weaviate

This workflow allows you to upload a PDF file and ask questions about it using the Question and Answer Chain and the Weaviate Vector Store nodes.

Who it's for

This workflow is the simplest possible implementation of RAG with Weaviate in n8n. It's intended to act as an extendable template for RAG over your own documents.

Prerequisites

An existing Weaviate cluster. You can view instructions for setting up a local cluster with Docker here or a Weaviate Cloud cluster here.
API keys to generate embeddings and power chat models. We use OpenAI, but feel free to switch out the models as you like.
Self-hosted n8n instance. See this video for how to get set up in just three minutes.

How it works

Part 1: Manually upload data

In this example, we manually upload a 100+ page article from arXiv called "A Survey of Large Language Models". But you can replace this with your own more advanced data pipeline, if you wish.

Part 2: Embed and load data into Weaviate collection

Here, we generate embeddings for the full-text of the article and store them in Weaviate.

Part 3: Perform RAG over PDF file with Weaviate

In this part of the workflow, you can enter your query by running the Chat Node and get a RAG response grounded in context via the Question and Answer Chain node.

How to run the workflow

Go through the prerequisites, creating a Weaviate cluster (can be local or cloud), downloading self-hosted n8n, and adding your API keys and other credentials.
Select the embedding and chat models you'd like to use.
Upload a PDF file you want to ask questions about.
Execute the rest of the workflow.