Generate AI-powered investment due diligence PDF reports with OpenAI, LlamaParse and Decodo

Created by

Khairul Muhtadin

Last update

Last update 21 days ago

Why Use This Workflow?

Time Savings: Reduces initial deal screening and report generation from 6–8 hours of manual analysis to under 5 minutes.
Accuracy & Depth: Employs a multi-query RAG (Retrieval-Augmented Generation) strategy that cross-references internal deal documents with verified external web evidence.
Cost Reduction: Eliminates the need for expensive junior analyst hours for preliminary data gathering and document summarization.
Scalability: Effortlessly processes multiple deals simultaneously, maintaining a consistent reporting standard across your entire pipeline.

Ideal For

Venture Capital & Private Equity: Rapidly assessing incoming pitch decks and CIMs (Confidential Information Memorandums).
M&A Advisory Teams: Automating the creation of standardized target company profiles and risk summaries.
Investment Analysts: Generating structured data from unstructured PDFs to feed into internal valuation models.

How It Works

Trigger: A webhook receives document uploads (PDF, DOCX, PPTX) via a custom portal or API.
Data Collection: LlamaParse converts complex document layouts into clean Markdown, preserving tables and financial structures.
Processing: The workflow generates a unique "Deal ID" based on filenames to ensure data isolation and implements a caching layer via Pinecone to avoid redundant parsing.
Intelligence Layer:
- Web Enrichment: The workflow derives the target company name and uses Decodo to scrape official websites for "About" and "Commercial Risk" data.
- Multi-Query RAG: An OpenAI-powered agent executes six specific retrieval queries (Financials, Risks, Business Model, etc.) to gather evidence from all sources.
Output & Delivery: Analysis is mapped to a structured template, rendered into a professional HTML report, and converted to a high-quality PDF using Puppeteer.
Storage & Logging: The final report is uploaded to Cloudflare R2, and a public, secure URL is returned to the user instantly.

Setup Guide

Prerequisites

Requirement	Type	Purpose
n8n instance	Essential	Core automation and workflow orchestration
LlamaIndex Cloud	Essential	High-accuracy document parsing (LlamaParse)
Pinecone	Essential	Vector database for document and web evidence storage
OpenAI API	Essential	LLM for embeddings and expert analysis (Embedding Small & GPT-5.2)
Decodo API	Essential	Real-time web searching and markdown scraping
R2 Bucket	Essential	Secure storage for the generated PDF reports

Installation Steps

Import the JSON file to your n8n instance.
Configure credentials:
- OpenAI: Add your API key for embeddings and the Chat Model.
- Pinecone: Enter your API Key and Index name (default: poc).
- LlamaIndex: Add your API key under Header Auth (Authorization: Bearer YOUR_KEY).
- Decodo: Set up your Decodo API credentials for web search and scraping.
- AWS S3: Configure your bucket name and access keys.
Update environment-specific values:
- In the "Build Public Report URL" node, update the baseUrl to match your S3 bucket's public endpoint or CDN.
Test execution:
- Send a POST request to the webhook URL with a binary file (e.g., a Pitch Deck) to verify the end-to-end generation.

Technical Details

Core Nodes

Node	Purpose	Key Configuration
LlamaParse (HTTP)	Document Conversion	Uses the `/parsing/upload` and `/job/result` endpoints for high-fidelity markdown
Pinecone Vector Store	Context Storage	Implements namespace-based isolation using the unique `dealId`
Decodo Search/Scrape	Web Intelligence	Dynamically identifies the official domain and extracts corporate metadata
AI Agent	Strategic Analysis	Configured with a "Senior Investment Analyst" system prompt and 6-step retrieval logic
Puppeteer	PDF Generation	Renders the styled HTML report into a print-ready A4 PDF

Workflow Logic

The workflow uses a Multi-Query Retrieval strategy. Instead of asking one generic question, the AI Agent is forced to perform six distinct searches against the vector database (Revenue History, Key Risks, etc.). This ensures that even if a document is 100 pages long, the AI doesn't "miss" critical financial tables or risk disclosures buried in the text.

Customization Options

Basic Adjustments

Report Styling: Edit the "Render DD Report HTML" node to match your firm's branding (logo, colors, fonts).
Analysis Scope: Modify the AI Agent's prompt to include specific metrics (e.g., "ESG Score" or "Technical Debt Assessment").

Advanced Enhancements

Slack/Email Integration: Instead of just an S3 link, have n8n send the PDF directly to a #new-deals Slack channel.
CRM Sync: Automatically create a new record in HubSpot or Salesforce with the structured JSON output attached.

Troubleshooting

Problem	Cause	Solution
Parsing Timeout	File is too large for synchronous processing	Increase the "Wait" node duration or check LlamaParse job limits
Low Analysis Quality	Insufficient context in documents	Ensure documents are text-based PDFs (not scans) or enable OCR in LlamaParse
PDF Layout Broken	CSS incompatibility in Puppeteer	Simplify CSS in the HTML node; avoid complex Flexbox/Grid if Puppeteer version is older

Use Case Examples

Scenario 1: Venture Capital Deal Screening

Challenge: A VC associate receives 20 pitch decks a day and spends hours manually summarizing company profiles.

Solution: This workflow parses the deck and web-scrapes the startup's site to verify claims.

Result: The associate receives a 3-page PDF summary for every deck, allowing them to reject or move forward in seconds.

Scenario 2: Private Equity Due Diligence

Challenge: Analyzing a 150-page CIM (Information Memorandum) for specific financial "red flags."

Solution: The AI Agent is programmed to specifically hunt for customer concentration and margin fluctuations.

Result: Consistent risk identification across all deals, regardless of which analyst is assigned to the project.

Created by: Khmuhtadin
Category: Business Intelligence | Tags: Decodo, AI, RAG, Due Diligence, LlamaIndex, Pinecone

Need custom workflows? Contact us

Connect with the creator:
Portfolio • Store • LinkedIn • Medium • Threads