Back to Templates

Generate AI-powered investment due diligence PDF reports with OpenAI, LlamaParse and Decodo

Last update

Last update 6 hours ago

Share


Transform raw investment memorandums and financial decks into comprehensive, professional Due Diligence (DD) PDF reports. This workflow automates document parsing via LlamaParse, enriches internal data with real-time web intelligence using Decodo, and utilizes an AI Agent to synthesize structured financial analysis, risk assessments, and investment theses.

M&A Decodo Home

Why Use This Workflow?

  • Time Savings: Reduces initial deal screening and report generation from 6–8 hours of manual analysis to under 5 minutes.
  • Accuracy & Depth: Employs a multi-query RAG (Retrieval-Augmented Generation) strategy that cross-references internal deal documents with verified external web evidence.
  • Cost Reduction: Eliminates the need for expensive junior analyst hours for preliminary data gathering and document summarization.
  • Scalability: Effortlessly processes multiple deals simultaneously, maintaining a consistent reporting standard across your entire pipeline.

Ideal For

  • Venture Capital & Private Equity: Rapidly assessing incoming pitch decks and CIMs (Confidential Information Memorandums).
  • M&A Advisory Teams: Automating the creation of standardized target company profiles and risk summaries.
  • Investment Analysts: Generating structured data from unstructured PDFs to feed into internal valuation models.

How It Works

  1. Trigger: A webhook receives document uploads (PDF, DOCX, PPTX) via a custom portal or API.
  2. Data Collection: LlamaParse converts complex document layouts into clean Markdown, preserving tables and financial structures.
  3. Processing: The workflow generates a unique "Deal ID" based on filenames to ensure data isolation and implements a caching layer via Pinecone to avoid redundant parsing.
  4. Intelligence Layer:
    • Web Enrichment: The workflow derives the target company name and uses Decodo to scrape official websites for "About" and "Commercial Risk" data.
    • Multi-Query RAG: An OpenAI-powered agent executes six specific retrieval queries (Financials, Risks, Business Model, etc.) to gather evidence from all sources.
  5. Output & Delivery: Analysis is mapped to a structured template, rendered into a professional HTML report, and converted to a high-quality PDF using Puppeteer.
  6. Storage & Logging: The final report is uploaded to Cloudflare R2, and a public, secure URL is returned to the user instantly.

Setup Guide

Prerequisites

Requirement Type Purpose
n8n instance Essential Core automation and workflow orchestration
LlamaIndex Cloud Essential High-accuracy document parsing (LlamaParse)
Pinecone Essential Vector database for document and web evidence storage
OpenAI API Essential LLM for embeddings and expert analysis (Embedding Small & GPT-5.2)
Decodo API Essential Real-time web searching and markdown scraping
R2 Bucket Essential Secure storage for the generated PDF reports

Installation Steps

  1. Import the JSON file to your n8n instance.
  2. Configure credentials:
    • OpenAI: Add your API key for embeddings and the Chat Model.
    • Pinecone: Enter your API Key and Index name (default: poc).
    • LlamaIndex: Add your API key under Header Auth (Authorization: Bearer YOUR_KEY).
    • Decodo: Set up your Decodo API credentials for web search and scraping.
    • AWS S3: Configure your bucket name and access keys.
  3. Update environment-specific values:
    • In the "Build Public Report URL" node, update the baseUrl to match your S3 bucket's public endpoint or CDN.
  4. Test execution:
    • Send a POST request to the webhook URL with a binary file (e.g., a Pitch Deck) to verify the end-to-end generation.

Technical Details

Core Nodes

Node Purpose Key Configuration
LlamaParse (HTTP) Document Conversion Uses the /parsing/upload and /job/result endpoints for high-fidelity markdown
Pinecone Vector Store Context Storage Implements namespace-based isolation using the unique dealId
Decodo Search/Scrape Web Intelligence Dynamically identifies the official domain and extracts corporate metadata
AI Agent Strategic Analysis Configured with a "Senior Investment Analyst" system prompt and 6-step retrieval logic
Puppeteer PDF Generation Renders the styled HTML report into a print-ready A4 PDF

Workflow Logic

The workflow uses a Multi-Query Retrieval strategy. Instead of asking one generic question, the AI Agent is forced to perform six distinct searches against the vector database (Revenue History, Key Risks, etc.). This ensures that even if a document is 100 pages long, the AI doesn't "miss" critical financial tables or risk disclosures buried in the text.

Customization Options

Basic Adjustments

  • Report Styling: Edit the "Render DD Report HTML" node to match your firm's branding (logo, colors, fonts).
  • Analysis Scope: Modify the AI Agent's prompt to include specific metrics (e.g., "ESG Score" or "Technical Debt Assessment").

Advanced Enhancements

  • Slack/Email Integration: Instead of just an S3 link, have n8n send the PDF directly to a #new-deals Slack channel.
  • CRM Sync: Automatically create a new record in HubSpot or Salesforce with the structured JSON output attached.

Troubleshooting

Problem Cause Solution
Parsing Timeout File is too large for synchronous processing Increase the "Wait" node duration or check LlamaParse job limits
Low Analysis Quality Insufficient context in documents Ensure documents are text-based PDFs (not scans) or enable OCR in LlamaParse
PDF Layout Broken CSS incompatibility in Puppeteer Simplify CSS in the HTML node; avoid complex Flexbox/Grid if Puppeteer version is older

Use Case Examples

Scenario 1: Venture Capital Deal Screening

Challenge: A VC associate receives 20 pitch decks a day and spends hours manually summarizing company profiles.

Solution: This workflow parses the deck and web-scrapes the startup's site to verify claims.

Result: The associate receives a 3-page PDF summary for every deck, allowing them to reject or move forward in seconds.

Scenario 2: Private Equity Due Diligence

Challenge: Analyzing a 150-page CIM (Information Memorandum) for specific financial "red flags."

Solution: The AI Agent is programmed to specifically hunt for customer concentration and margin fluctuations.

Result: Consistent risk identification across all deals, regardless of which analyst is assigned to the project.


Created by: Khmuhtadin
Category: Business Intelligence | Tags: Decodo, AI, RAG, Due Diligence, LlamaIndex, Pinecone

Need custom workflows? Contact us

Connect with the creator:
PortfolioStoreLinkedInMediumThreads