Overview
This workflow automatically converts CSV or Excel files into a production-ready database schema using AI and rule-based validation.
It analyzes uploaded data, detects column types, relationships, and data quality, then generates a normalized schema. The output includes SQL DDL scripts, ERD diagrams, a data dictionary, and a load plan.
This eliminates manual schema design and accelerates database setup from raw data.
How It Works
-
File Upload (Webhook)
- Accepts CSV or XLSX files via webhook endpoint
- Initializes workflow configuration (thresholds, retry limits)
-
File Extraction
- Detects file format (CSV or Excel)
- Extracts rows into structured JSON
- Merges extracted datasets
-
Data Cleaning & Profiling
- Removes duplicates and normalizes values
- Detects data types (integer, float, date, boolean, string)
- Computes column statistics (nulls, uniqueness, distributions)
- Generates file hash and sample dataset
-
Column Profiling Engine
- Identifies potential primary keys
- Detects cardinality and uniqueness levels
- Suggests foreign key relationships based on value overlap
-
AI Schema Generation
- Uses an AI agent to design normalized tables
- Assigns SQL data types based on real data
- Defines primary keys, foreign keys, constraints, and indexes
-
Validation Layer
- Ensures schema matches actual data
- Validates:
- Data types
- Primary key uniqueness
- Foreign key overlap (>70%)
- Constraint consistency
- Detects circular dependencies
-
Revision Loop
- If validation fails:
- Sends feedback to AI agent
- Regenerates schema
- Retries up to configured limit
-
Schema Output Generation
- Generates:
- SQL DDL scripts
- ERD (Mermaid format)
- Data dictionary
- Load plan with dependency graph
-
Load Plan Engine
- Computes optimal table insertion order
- Detects circular dependencies
- Suggests batching strategy
-
Combine & Explain
- Merges all outputs
- Optional AI explanation of schema decisions
- Response Output
- Returns structured JSON via webhook:
- SQL schema
- ERD summary
- Data dictionary
- Load plan
- Optional explanation
Setup Instructions
- Activate the workflow and copy the webhook URL
- Send a POST request with a CSV or XLSX file
- Configure OpenAI credentials (used by AI agent)
- Adjust thresholds if needed (FK overlap, retries, confidence)
- Execute workflow and review generated outputs
Use Cases
- Auto-generate database schema from CSV/Excel files
- Data migration and onboarding pipelines
- Rapid database prototyping
- Reverse engineering datasets
- AI-assisted data modeling
Requirements
- n8n (latest version recommended)
- OpenAI API credentials
- LangChain nodes enabled
- CSV or XLSX input file