π What This Workflow Does
This workflow transforms any PDF legal contract into a detailed AI-powered risk report β in under 5 minutes. Upload a contract, and the system automatically splits it into clauses, analyses each one using Hybrid RAG (semantic + keyword search), scores risk as HIGH / MEDIUM / LOW, and delivers plain-English explanations with safer alternative wording.
π₯ Why Hybrid RAG?
Most dangerous clauses don't use obvious legal keywords.
"The Client accepts full responsibility for all third-party claims" is an indemnification clause β but keyword search misses it.
Hybrid RAG combines:
- Vector Search (pgvector) β finds semantically similar risky patterns
- BM25 Keyword Search β catches explicit legal red flags
- RRF Reranking β merges both results with clause-type boosting
π What It Does
- Accepts a PDF contract via webhook (with async job_id tracking)
- Splits contract into individual numbered clauses
- Classifies each clause type using Google Gemini (indemnification, IP, termination, etc.)
- Generates vector embeddings and searches a Supabase knowledge base
- Scores each clause HIGH / MEDIUM / LOW using regex + AI
- AI Agent (Gemini Flash) explains risk in plain language + suggests safer wording
- Aggregates all results into a single JSON report
- Saves report to Supabase (frontend polls for result asynchronously)
βοΈ Architecture (Two Pipelines)
Pipeline 1 β Ingestion: Builds the knowledge base of risky clause patterns in Supabase
Pipeline 2 β Query: Analyses new contracts against the knowledge base
Both pipelines run in the same workflow β the branch splits at Extract Embedding.
π§ Key Technical Decisions
- Async architecture β Frontend fires request + polls Supabase. No timeout issues.
- job_id tracking β Preserved across all nodes via ...$json spread
- RRF Reranking β Combines vector + BM25 scores with type-based boost multipliers
- Regex Risk Scorer β First-pass risk classification before expensive LLM call
- Gemini Flash β Fast, cost-efficient LLM for per-clause annotation
π¦ Requirements
- Google Gemini API key β for clause classification + embeddings + AI Agent
- Supabase project β with pgvector extension enabled
- Supabase tables:
legal_clauses (knowledge base) + reports (results)
- Supabase functions:
match_clauses() + keyword_search_clauses()
- Frontend (optional): HTML/CSS/JS web app hosted on Netlify
π‘ Example Use Cases
- Freelancers reviewing client contracts before signing
- Startups evaluating vendor or investor agreements
- Legal ops teams standardising contract review at scale
- Business owners catching risky clauses without legal fees
π― Output
- Per-clause: risk_level, plain-English explanation, risk_reason, safer_alternative, key_obligations, legal_area
- Summary: overall_risk_score, risk_distribution, legal_areas map, high_risk_clauses list
- Stored as JSON in Supabase
reports table, keyed by job_id