Back to Templates

Analyze legal contract risk with Google Gemini hybrid RAG and Supabase

Created by

Created by: Divyanshu Gupta || divyanshugupta
Divyanshu Gupta

Last update

Last update 6 hours ago

Share


🚀 What This Workflow Does

This workflow transforms any PDF legal contract into a detailed AI-powered risk report — in under 5 minutes. Upload a contract, and the system automatically splits it into clauses, analyses each one using Hybrid RAG (semantic + keyword search), scores risk as HIGH / MEDIUM / LOW, and delivers plain-English explanations with safer alternative wording.


🔥 Why Hybrid RAG?

Most dangerous clauses don't use obvious legal keywords.
"The Client accepts full responsibility for all third-party claims" is an indemnification clause — but keyword search misses it.
Hybrid RAG combines:

  • Vector Search (pgvector) — finds semantically similar risky patterns
  • BM25 Keyword Search — catches explicit legal red flags
  • RRF Reranking — merges both results with clause-type boosting

🔍 What It Does

  • Accepts a PDF contract via webhook (with async job_id tracking)
  • Splits contract into individual numbered clauses
  • Classifies each clause type using Google Gemini (indemnification, IP, termination, etc.)
  • Generates vector embeddings and searches a Supabase knowledge base
  • Scores each clause HIGH / MEDIUM / LOW using regex + AI
  • AI Agent (Gemini Flash) explains risk in plain language + suggests safer wording
  • Aggregates all results into a single JSON report
  • Saves report to Supabase (frontend polls for result asynchronously)

⚙️ Architecture (Two Pipelines)

Pipeline 1 — Ingestion: Builds the knowledge base of risky clause patterns in Supabase
Pipeline 2 — Query: Analyses new contracts against the knowledge base

Both pipelines run in the same workflow — the branch splits at Extract Embedding.


🧠 Key Technical Decisions

  • Async architecture — Frontend fires request + polls Supabase. No timeout issues.
  • job_id tracking — Preserved across all nodes via ...$json spread
  • RRF Reranking — Combines vector + BM25 scores with type-based boost multipliers
  • Regex Risk Scorer — First-pass risk classification before expensive LLM call
  • Gemini Flash — Fast, cost-efficient LLM for per-clause annotation

📦 Requirements

  • Google Gemini API key — for clause classification + embeddings + AI Agent
  • Supabase project — with pgvector extension enabled
  • Supabase tables: legal_clauses (knowledge base) + reports (results)
  • Supabase functions: match_clauses() + keyword_search_clauses()
  • Frontend (optional): HTML/CSS/JS web app hosted on Netlify

💡 Example Use Cases

  • Freelancers reviewing client contracts before signing
  • Startups evaluating vendor or investor agreements
  • Legal ops teams standardising contract review at scale
  • Business owners catching risky clauses without legal fees

🎯 Output

  • Per-clause: risk_level, plain-English explanation, risk_reason, safer_alternative, key_obligations, legal_area
  • Summary: overall_risk_score, risk_distribution, legal_areas map, high_risk_clauses list
  • Stored as JSON in Supabase reports table, keyed by job_id