Track certification requirement changes with ScrapeGraphAI, GitHub and email

Created by

Last update

Last update 11 days ago

Job Posting Aggregator with Email and GitHub

⚠️ COMMUNITY TEMPLATE DISCLAIMER: This is a community-contributed template that uses ScrapeGraphAI (a community node). Please ensure you have the ScrapeGraphAI community node installed in your n8n instance before using this template.

This workflow automatically aggregates certification-related job-posting requirements from multiple industry sources, compares them against last year’s data stored in GitHub, and emails a concise change log to subscribed professionals. It streamlines annual requirement checks and renewal reminders, ensuring users never miss an update.

Pre-conditions/Requirements

Prerequisites

n8n instance (self-hosted or n8n cloud)
ScrapeGraphAI community node installed
Git installed (for optional local testing of the repo)
Working SMTP server or other Email credential supported by n8n

Required Credentials

ScrapeGraphAI API Key – Enables web scraping of certification pages
GitHub Personal Access Token – Allows the workflow to read/write files in the repo
Email / SMTP Credentials – Sends the summary email to end-users

Specific Setup Requirements

Resource	Purpose	Example
GitHub Repository	Stores `certification_requirements.json` versioned annually	`https://github.com/<you>/cert-requirements.git`
Watch List File	List of page URLs & selectors to scrape	Saved in the repo under `/config/watchList.json`
Email List	Semicolon-separated list of recipients	`[email protected];[email protected]`

How it works

Key Steps:

Manual Trigger: Starts the workflow on demand or via scheduled cron.
Load Watch List (Code Node): Reads the list of certification URLs and CSS selectors.
Split In Batches: Iterates through each URL to avoid rate limits.
ScrapeGraphAI: Scrapes requirement details from each page.
Merge (Wait): Reassembles individual scrape results into a single JSON array.
GitHub (Read File): Retrieves last year’s certification_requirements.json.
IF (Change Detector): Compares current vs. previous JSON and decides whether changes exist.
Email Send: Composes and sends a formatted summary of changes.
GitHub (Upsert File): Commits the new JSON file back to the repo for future comparisons.

Set up steps

Setup Time: 15-25 minutes

Install Community Node: From n8n UI → Settings → Community Nodes → search and install “ScrapeGraphAI”.
Create/Clone GitHub Repo: Add an empty certification_requirements.json ( {} ) and a config/watchList.json with an array of objects like:
```
[
  {
    "url": "https://cert-body.org/requirements",
    "selector": "#requirements"
  }
]
```
Generate GitHub PAT: Scope repo, store in n8n Credentials as “GitHub API”.
Add ScrapeGraphAI Credential: Paste your API key into n8n Credentials.
Configure Email Credentials: E.g., SMTP with username/password or OAuth2.
Open Workflow: Import the template JSON into n8n.
Update Environment Variables (in the Code node or via n8n variables):
- GITHUB_REPO (e.g., user/cert-requirements)
- EMAIL_RECIPIENTS
Test Run: Trigger manually. Verify email content and GitHub commit.
Schedule: Add a Cron node (optional) for yearly or quarterly automatic runs.

Node Descriptions

Core Workflow Nodes:

Manual Trigger – Initiates the workflow manually or via external schedule.
Code (Load Watch List) – Reads and parses watchList.json from GitHub or static input.
SplitInBatches – Controls request concurrency to avoid scraping bans.
ScrapeGraphAI – Extracts requirement text using provided CSS selectors or XPath.
Merge (Combine) – Waits for all batches and merges them into one dataset.
GitHub (Read/Write File) – Handles version-controlled storage of JSON data.
IF (Change Detector) – Compares hashes/JSON diff to detect updates.
EmailSend – Sends change log, including renewal reminders and diff summary.
Sticky Note – Provides in-workflow documentation for future editors.

Data Flow:

Manual Trigger → Code (Load Watch List) → SplitInBatches
SplitInBatches → ScrapeGraphAI → Merge
Merge → GitHub (Read File) → IF (Change Detector)
IF (True) → Email Send → GitHub (Upsert File)

Customization Examples

Adjusting Scraper Configuration

// Inside the Watch List JSON object
{
  "url": "https://new-association.com/cert-update",
  "selector": ".content article:nth-of-type(1) ul"
}

Custom Email Template

// In Email Send node → HTML Content
&lt;div&gt;
  &lt;h2&gt;📋 Certification Updates – {{ $json.date }}&lt;/h2&gt;
  &lt;p&gt;The following certifications have new requirements:&lt;/p&gt;
  &lt;ul&gt;
    {{ $json.diffHtml }}
  &lt;/ul&gt;
  &lt;p&gt;For full details visit our GitHub repo.&lt;/p&gt;
&lt;/div&gt;

Data Output Format

The workflow outputs structured JSON data:

{
  "timestamp": "2024-09-01T12:00:00Z",
  "source": "watchList.json",
  "current": {
    "AWS-SAA": "Version 3.0, requires renewed proctored exam",
    "PMP": "60 PDUs every 3 years"
  },
  "previous": {
    "AWS-SAA": "Version 2.0",
    "PMP": "60 PDUs every 3 years"
  },
  "changes": {
    "AWS-SAA": "Updated to Version 3.0; exam format changed."
  }
}

Troubleshooting

Common Issues

ScrapeGraphAI returns empty data – Check CSS/XPath selectors and ensure page is publicly accessible.
GitHub authentication fails – Verify PAT scope includes repo and that the credential is linked in both GitHub nodes.

Performance Tips

Limit SplitInBatches size to 3-5 URLs when sources are heavy to avoid timeouts.
Enable n8n execution mode “Queue” for long-running scrapes.

Pro Tips:

Store selector samples in comments next to each watch list entry for future maintenance.
Use a Cron node set to “0 0 1 1 *” for an annual run exactly on Jan 1st.
Add a Telegram node after Email Send for instant mobile notifications.