Transform raw investment memorandums and financial decks into comprehensive, professional Due Diligence (DD) PDF reports. This workflow automates document parsing via LlamaParse, enriches internal data with real-time web intelligence using Decodo, and utilizes an AI Agent to synthesize structured financial analysis, risk assessments, and investment theses.

Why Use This Workflow?
- Time Savings: Reduces initial deal screening and report generation from 6–8 hours of manual analysis to under 5 minutes.
- Accuracy & Depth: Employs a multi-query RAG (Retrieval-Augmented Generation) strategy that cross-references internal deal documents with verified external web evidence.
- Cost Reduction: Eliminates the need for expensive junior analyst hours for preliminary data gathering and document summarization.
- Scalability: Effortlessly processes multiple deals simultaneously, maintaining a consistent reporting standard across your entire pipeline.
Ideal For
- Venture Capital & Private Equity: Rapidly assessing incoming pitch decks and CIMs (Confidential Information Memorandums).
- M&A Advisory Teams: Automating the creation of standardized target company profiles and risk summaries.
- Investment Analysts: Generating structured data from unstructured PDFs to feed into internal valuation models.
How It Works
- Trigger: A webhook receives document uploads (PDF, DOCX, PPTX) via a custom portal or API.
- Data Collection: LlamaParse converts complex document layouts into clean Markdown, preserving tables and financial structures.
- Processing: The workflow generates a unique "Deal ID" based on filenames to ensure data isolation and implements a caching layer via Pinecone to avoid redundant parsing.
- Intelligence Layer:
- Web Enrichment: The workflow derives the target company name and uses Decodo to scrape official websites for "About" and "Commercial Risk" data.
- Multi-Query RAG: An OpenAI-powered agent executes six specific retrieval queries (Financials, Risks, Business Model, etc.) to gather evidence from all sources.
- Output & Delivery: Analysis is mapped to a structured template, rendered into a professional HTML report, and converted to a high-quality PDF using Puppeteer.
- Storage & Logging: The final report is uploaded to Cloudflare R2, and a public, secure URL is returned to the user instantly.
Setup Guide
Prerequisites
| Requirement |
Type |
Purpose |
| n8n instance |
Essential |
Core automation and workflow orchestration |
| LlamaIndex Cloud |
Essential |
High-accuracy document parsing (LlamaParse) |
| Pinecone |
Essential |
Vector database for document and web evidence storage |
| OpenAI API |
Essential |
LLM for embeddings and expert analysis (Embedding Small & GPT-5.2) |
| Decodo API |
Essential |
Real-time web searching and markdown scraping |
| R2 Bucket |
Essential |
Secure storage for the generated PDF reports |
Installation Steps
- Import the JSON file to your n8n instance.
- Configure credentials:
- OpenAI: Add your API key for embeddings and the Chat Model.
- Pinecone: Enter your API Key and Index name (default:
poc).
- LlamaIndex: Add your API key under Header Auth (
Authorization: Bearer YOUR_KEY).
- Decodo: Set up your Decodo API credentials for web search and scraping.
- AWS S3: Configure your bucket name and access keys.
- Update environment-specific values:
- In the "Build Public Report URL" node, update the
baseUrl to match your S3 bucket's public endpoint or CDN.
- Test execution:
- Send a POST request to the webhook URL with a binary file (e.g., a Pitch Deck) to verify the end-to-end generation.
Technical Details
Core Nodes
| Node |
Purpose |
Key Configuration |
| LlamaParse (HTTP) |
Document Conversion |
Uses the /parsing/upload and /job/result endpoints for high-fidelity markdown |
| Pinecone Vector Store |
Context Storage |
Implements namespace-based isolation using the unique dealId |
| Decodo Search/Scrape |
Web Intelligence |
Dynamically identifies the official domain and extracts corporate metadata |
| AI Agent |
Strategic Analysis |
Configured with a "Senior Investment Analyst" system prompt and 6-step retrieval logic |
| Puppeteer |
PDF Generation |
Renders the styled HTML report into a print-ready A4 PDF |
Workflow Logic
The workflow uses a Multi-Query Retrieval strategy. Instead of asking one generic question, the AI Agent is forced to perform six distinct searches against the vector database (Revenue History, Key Risks, etc.). This ensures that even if a document is 100 pages long, the AI doesn't "miss" critical financial tables or risk disclosures buried in the text.
Customization Options
Basic Adjustments
- Report Styling: Edit the "Render DD Report HTML" node to match your firm's branding (logo, colors, fonts).
- Analysis Scope: Modify the AI Agent's prompt to include specific metrics (e.g., "ESG Score" or "Technical Debt Assessment").
Advanced Enhancements
- Slack/Email Integration: Instead of just an S3 link, have n8n send the PDF directly to a
#new-deals Slack channel.
- CRM Sync: Automatically create a new record in HubSpot or Salesforce with the structured JSON output attached.
Troubleshooting
| Problem |
Cause |
Solution |
| Parsing Timeout |
File is too large for synchronous processing |
Increase the "Wait" node duration or check LlamaParse job limits |
| Low Analysis Quality |
Insufficient context in documents |
Ensure documents are text-based PDFs (not scans) or enable OCR in LlamaParse |
| PDF Layout Broken |
CSS incompatibility in Puppeteer |
Simplify CSS in the HTML node; avoid complex Flexbox/Grid if Puppeteer version is older |
Use Case Examples
Scenario 1: Venture Capital Deal Screening
Challenge: A VC associate receives 20 pitch decks a day and spends hours manually summarizing company profiles.
Solution: This workflow parses the deck and web-scrapes the startup's site to verify claims.
Result: The associate receives a 3-page PDF summary for every deck, allowing them to reject or move forward in seconds.
Scenario 2: Private Equity Due Diligence
Challenge: Analyzing a 150-page CIM (Information Memorandum) for specific financial "red flags."
Solution: The AI Agent is programmed to specifically hunt for customer concentration and margin fluctuations.
Result: Consistent risk identification across all deals, regardless of which analyst is assigned to the project.
Created by: Khmuhtadin
Category: Business Intelligence | Tags: Decodo, AI, RAG, Due Diligence, LlamaIndex, Pinecone
Need custom workflows? Contact us
Connect with the creator:
Portfolio • Store • LinkedIn • Medium • Threads