This Workflow auto-ingests Google Drive documents, parses them with LlamaIndex, and stores Azure OpenAI embeddings in an in-memory vector store—cutting manual update time from ~30 minutes to under 2 minutes per doc.
Why Use This Workflow?
Cost Reduction: Eliminates pays monthly fee on cloud just for store knowledge
Ideal For
- Knowledge Managers / Documentation Teams: Automatically keep product docs and SOPs in sync when source files change on Google Drive.
- Support Teams: Ensure the searchable KB is always up-to-date after doc edits, speeding agent onboarding and resolution time.
- Developer / AI Teams: Populate an in-memory vector store for experiments, rapid prototyping, or local RAG demos.
How It Works
- Trigger: Google Drive Trigger watches a specific document or folder for updates.
- Data Collection: The updated file is downloaded from Google Drive.
- Processing: The file is uploaded to LlamaIndex cloud via an HTTP Request to create a parsing job.
- Intelligence Layer: Workflow polls LlamaIndex job status (Wait + Monitor loop). If parsing status equals SUCCESS, the result is retrieved as markdown.
- Output & Delivery: Parsed markdown is loaded into LangChain's Default Data Loader, passed to Azure OpenAI embeddings (deployment "3small"), then inserted into an in-memory vector store.
- Storage & Logging: Vector store holds embeddings in memory (good for prototyping). Optionally persist to an external vector DB for production.
Setup Guide
Prerequisites
Requirement |
Type |
Purpose |
n8n instance |
Essential |
Execute and import the workflow — use the n8n instance |
Google Drive OAuth2 |
Essential |
Watch and download documents from Google Drive |
LlamaIndex Cloud API |
Essential |
Parse and convert documents to structured markdown |
Azure OpenAI Account |
Essential |
Generate embeddings (deployment configured to model name "3small") |
Persistent Vector DB (e.g., Pinecone) |
Optional |
Persist embeddings for production-scale search |
Installation Steps
- Import the workflow JSON into your n8n instance: open your n8n instance and import the file.
- Configure credentials:
- Azure OpenAI: Provide Endpoint, API Key and set deployment name.
- LlamaIndex API: Create an HTTP Header Auth credential in n8n. Header Name: Authorization. Header Value: Bearer YOUR_API_KEY.
- Google Drive OAuth2: Create OAuth 2.0 credentials in Google Cloud Console, enable Drive API, and configure the Google Drive OAuth2 credential in n8n.
- Update environment-specific values:
- Replace the workflow's Google Drive fileId with the GUID or folder ID you want to watch (do not commit public IDs).
- Customize settings:
- Polling interval (Wait node): adjust for faster or slower job status checks.
- Target file or folder: toggled on the Google Drive Trigger node.
- Embedding model: change Azure OpenAI deployment if needed.
- Test execution:
- Save changes and trigger a sample file update on Drive. Verify each node runs and the vector store receives embeddings.
Technical Details
Core Nodes
Node |
Purpose |
Key Configuration |
Knowledge Base Updated Trigger (Google Drive Trigger) |
Triggers on file/folder changes |
Set trigger type to specific file or folder; configure OAuth2 credential |
Download Knowledge Document (Google Drive) |
Downloads file binary |
Operation: download; ensure OAuth2 credential is selected |
Parse Document via LlamaIndex (HTTP Request) |
Uploads file to LlamaIndex parsing endpoint |
POST multipart/form-data to /parsing/upload; use HTTP Header Auth credential |
Monitor Document Processing (HTTP Request) |
Polls parsing job status |
GET /parsing/job/{{jobId}}; check status field |
Check Parsing Completion (If) |
Branches on job status |
Condition: {{$json.status}} equals SUCCESS |
Retrieve Parsed Content (HTTP Request) |
Fetches parsed markdown result |
GET /parsing/job/{{jobId}}/result/markdown |
Default Data Loader (LangChain) |
Loads parsed markdown into document format |
Use as document source for embeddings |
Embeddings Azure OpenAI |
Generates embeddings for documents |
Credentials: Azure OpenAI; Model/Deployment: 3small |
Insert Data to Store (vectorStoreInMemory) |
Stores documents + embeddings |
Use memory store for prototyping; switch to DB for persistence |
Workflow Logic
- On Drive change, the file binary is downloaded and sent to LlamaIndex.
- Workflow enters a monitor loop: Monitor Document Processing fetches job status, If node checks status. If not SUCCESS, Wait node delays before re-check.
- When parsing completes, the workflow retrieves markdown, loads documents, creates embeddings via Azure OpenAI, and inserts data into an in-memory vector store.
Customization Options
Basic Adjustments:
- Poll Delay: Set Wait node (default: every minute) to balance speed vs. API quota.
- Target Scope: Switch the trigger from a single file to a folder to auto-handle many docs.
- Embedding Model: Swap Azure deployment for a different model name as needed.
Advanced Enhancements:
- Persistent Vector DB Integration: Replace vectorStoreInMemory with Pinecone or Milvus for production search.
- Notification: Add Slack or email nodes to notify when parsing completes or fails.
- Summarization: Add an LLM summarization step to generate chunk-level summaries.
Scaling option:
- Batch uploads and chunking to reduce embedding calls; use a queue (Redis or n8n queue patterns) and horizontal workers for high throughput.
Performance & Optimization
Metric |
Expected Performance |
Optimization Tips |
Execution time (per doc) |
~10s–2min (depends on file size & LlamaIndex processing) |
Chunk large docs; run embeddings in batches |
API calls (per doc) |
3–8 (upload, poll(s), retrieve, embedding calls) |
Increase poll interval; consolidate requests |
Error handling |
Retries via Wait loop and If checks |
Add exponential backoff, failure notifications, and retry limits |
Troubleshooting
Problem |
Cause |
Solution |
Authentication errors |
Invalid/missing credentials |
Reconfigure n8n Credentials; do not paste API keys directly into nodes |
File not found |
Incorrect fileId or permissions |
Verify Drive fileId and OAuth scopes; share file with the service account if needed |
Parsing stuck in PENDING |
LlamaIndex processing delay or rate limit |
Increase Wait node interval, monitor LlamaIndex dashboard, add retry limits |
Embedding failures |
Model/deployment mismatch or quota limits |
Confirm Azure deployment name (3small) and subscription quotas |
Created by: khmuhtadin
Category: Knowledge Management
Tags: google-drive, llamaindex, azure-openai, embeddings, knowledge-base, vector-store
Need custom workflows? Contact us