Sync Gmail emails to PostgreSQL with S3 attachment storage
Automated Gmail Email Processing System
Who's it for
Businesses and individuals who need to:
- Archive email communications in a searchable database
- Backup email attachments to cloud storage
- Analyze email patterns and communication data
- Comply with data retention policies
- Integrate emails with other business systems
What it does
This workflow automatically captures, processes, and stores Gmail emails in a PostgreSQL database while uploading file attachments to S3/MinIO storage. It handles both individual emails (via Gmail Trigger) and bulk processing (via Schedule Trigger).
Key features:
- Dual processing: real-time individual emails + scheduled bulk retrieval
- Complete email metadata extraction (sender, recipients, labels, timestamps)
- HTML to plain text conversion for searchable content
- Binary attachment processing with metadata extraction
- Organized S3/MinIO file storage structure
- UPSERT database operations to prevent duplicates
How it works
- Email Capture: Gmail Trigger detects new emails, Schedule Trigger gets bulk emails from last hour
- Parallel Processing: Emails with attachments go through binary processing, others go directly to transformation
- Attachment Handling: Extract metadata, upload to S3/MinIO, create database references
- Data Transformation: Convert Gmail API format to PostgreSQL structure
- Storage: UPSERT emails to database with linked attachment information
Requirements
Credentials needed:
- Gmail OAuth2 (gmail.readonly scope)
- PostgreSQL database connection
- S3/MinIO storage credentials
Database setup:
Run the provided SQL schema to create the messages table with JSONB fields for flexible data storage.
How to set up
- Gmail OAuth2: Enable Gmail API in Google Cloud Console, create OAuth2 credentials
- PostgreSQL: Create database and run the SQL schema provided in setup sticky note
- S3/MinIO: Create bucket "gmail-attachments" with proper upload permissions
- Configure: Update authenticatedUserEmail in transform scripts to your email
- Test: Start with single email before enabling bulk processing
How to customize
- Email filters: Modify Gmail queries (in:sent, in:inbox) to target specific emails
- Storage structure: Change S3 file path format in Upload node
- Processing schedule: Adjust trigger frequencies based on email volume
- Database fields: Extend PostgreSQL schema for additional metadata
- Attachment types: Add file type filtering in binary processing logic
Note: This workflow processes emails from the last hour to avoid overwhelming the system. Adjust timeframes based on your email volume and processing needs.