This workflow automates the process of building a structured Knowledge Base (KB) from your existing Gmail inbox by processing historical email threads, extracting customer-support conversation pairs, classifying them with AI, generating vector embeddings, and storing everything in a PostgreSQL database — ready to power AI-assisted draft generation in downstream workflows.
The workflow is triggered manually and fetches the last N emails from your connected Gmail account. Each email is parsed, filtered, and deduplicated before being processed. A thread-level fetch retrieves the full conversation context, separating customer messages from support replies. An AI classification step categorises each thread into a defined scenario type. Vector embeddings are generated for semantic similarity search. Qualified threads are inserted into three database tables — kb_data for clean Q&A pairs, scenario_patterns for reusable handling logic, and corrections for diverse real-world examples — with strict deduplication to keep the KB clean and non-redundant.
📥 Manual Trigger — Run on demand whenever you want to import a batch of historical emails into the KB. No scheduling required.
📬 Fetch Emails from Gmail — Pulls the last N emails (configurable, default 100) from your Gmail account using OAuth2. Returns full email metadata including thread IDs, sender, subject, labels, and body.
🔍 Parse & Filter — Extracts structured fields from each email: sender address, subject, body (cleaned of links and quoted history), date, and thread ID. Filters out emails from your own domain (outbound support replies) and auto-generated emails (no-reply, notifications, newsletters). Only genuine inbound customer emails proceed.
🧵 Fetch Full Thread — For each qualified email, fetches the entire Gmail thread using the thread ID. Retrieves all messages in the conversation to identify the original customer message and the latest support reply.
🗂️ Assemble Thread Context — Reconstructs the conversation by separating customer messages from support replies. Builds a clean conversationText block used as AI input. Flags threads with no support reply for conditional handling.
🤖 AI Classification — Calls GPT-4o-mini (or Claude) with the assembled conversation. Returns a structured JSON output containing: scenario category, a concise Q&A pair, a handling pattern description, key entities, sentiment, resolution status, and a summary.
🔢 Generate Embeddings — Calls the OpenAI Embeddings API to generate a 1536-dimension vector for both the KB entry and the correction record. Used for semantic similarity search in the downstream draft-generation workflow.
🔁 Duplicate Detection & DB Insert — Performs cosine similarity checks against existing records before inserting. KB entries and scenario patterns are blocked if a match exceeds 92% similarity. Corrections are inserted freely (deduped at 92%) since diverse examples improve AI draft quality. New records are written to three tables: kb_data, scenario_patterns, and corrections.
pgvector extension enabled. Must have the three tables set up per the schema below.kb_data| Field | Type | Notes |
|---|---|---|
| id | SERIAL PRIMARY KEY | Auto-increment |
| thread_id | TEXT | Gmail thread ID |
| subject | TEXT | Email subject |
| category | TEXT | AI-assigned scenario type |
| question | TEXT | Customer issue / question |
| answer | TEXT | Support resolution |
| entities | TEXT | Key entities extracted by AI |
| sentiment | TEXT | Customer sentiment |
| resolution_status | TEXT | Resolved / Unresolved |
| embedding | vector(1536) | OpenAI embedding for similarity search |
| source | TEXT | historical_import |
| created_at | TIMESTAMP | Insert timestamp |
scenario_patterns| Field | Type | Notes |
|---|---|---|
| id | SERIAL PRIMARY KEY | Auto-increment |
| category | TEXT | Scenario type |
| pattern_description | TEXT | Handling logic summary |
| example_subject | TEXT | Representative subject line |
| embedding | vector(1536) | OpenAI embedding |
| source | TEXT | historical_import |
| created_at | TIMESTAMP | Insert timestamp |
corrections| Field | Type | Notes |
|---|---|---|
| id | SERIAL PRIMARY KEY | Auto-increment |
| gmail_thread_id | TEXT | Gmail thread ID |
| original_email_body | TEXT | Customer's original message |
| human_sent_text | TEXT | Actual support reply sent |
| ai_draft_text | TEXT | NULL for historical imports |
| diff_summary | TEXT | Import note or live diff |
| classification | TEXT | Scenario category |
| embedding | vector(1536) | OpenAI embedding |
| source | TEXT | historical_import |
| created_at | TIMESTAMP | Insert timestamp |
| Item | Estimated Cost |
|---|---|
| Gmail OAuth2 | Free |
| GPT-4o-mini (100 threads classification) | ~$0.20–0.50 |
| text-embedding-3-small (100 records × 2 embeddings) | ~$0.02 |
| PostgreSQL + pgvector (self-hosted) | ~$5–15/mo |
| n8n self-hosted (AWS t3.small) | ~$10–15/mo |
| Total per 100-email import run | ~$0.25–0.75 |
Gmail — Connect your Gmail account in n8n using OAuth2. Ensure the account is the support inbox. Grant read permissions for messages and threads.
PostgreSQL — Create a new database, enable the pgvector extension (CREATE EXTENSION IF NOT EXISTS vector), then create the three tables using the schema above.
OpenAI — Add your OpenAI API key as an n8n credential. Used for both the chat completion node (classification) and the HTTP Request node (embeddings).
Customise the AI prompt — Open the AI Classification node and update the system prompt to reflect your business type, support tone, and the scenario categories relevant to your domain (e.g. Refund Request, Order Status, Technical Issue, Billing Query).
Set your domain filter — In the Parse & Filter node, update the YOUR_DOMAIN variable to your support team's email domain so outbound replies are correctly excluded from customer email processing.
Set fetch limit — In the Fetch Emails node, set the limit parameter to the number of historical emails you want to import per run. Start with 5–10 to validate the pipeline before running at scale.
Run manually — Click Execute Workflow. Monitor the output of each node to verify parsing, classification, and DB inserts are working correctly.
✔ Converts your inbox into a structured KB — no manual tagging or categorisation required
✔ AI classification assigns scenario types and extracts Q&A pairs automatically
✔ Vector embeddings enable semantic similarity search in downstream draft workflows
✔ Smart deduplication keeps KB and scenario tables clean — no near-duplicate entries
✔ Corrections table accumulates diverse examples — improves AI draft quality over time
✔ Thread-aware — reconstructs full conversations, not just individual emails
✔ Modular AI node — swap GPT-4o-mini for Claude or any other model with minimal changes
✔ One-time historical import feeds directly into live draft-generation workflows
This workflow is Step 1 in a two-workflow system: