Stale Content Detector for Websites
Who is this for
Content marketers, SEO managers, and website owners who want to automatically find pages on their site that are outdated or need refreshing — without manually auditing every page.
What it does
This workflow fetches your sitemap, identifies pages that have not been updated in a configurable number of days, fetches each stale page, and uses AI to assess whether the content is actually outdated or still accurate.
- Sitemap parsing: Fetches your sitemap.xml and extracts all URLs with their last-modified dates
- Staleness filtering: Flags pages not updated in more than X days (default: 180) and sorts by most stale first
- Page content extraction: Fetches each stale page and extracts the title and body text
- AI freshness analysis: An OpenAI-powered agent reviews each page and rates it LOW, MEDIUM, HIGH, or CRITICAL with specific update suggestions
- Audit logging: Saves every reviewed page to a Google Sheet with the full AI analysis
- HTML email report: Builds a color-coded summary email showing each flagged page with its AI verdict and sends one consolidated digest
How to set up
- Open Site Configuration and set your
sitemapUrl, staleDays (default: 180), and alertEmail
- Create a Google Sheet with a ContentAudit tab (columns: scan_date, page_url, last_modified, days_since_update, ai_review)
- Paste your Google Sheet URL into the Save to Content Audit Sheet node
- Connect your Gmail OAuth2 credentials on the Email Content Audit Report node
- Connect your Google Sheets credentials
- Connect your OpenAI API credentials on the OpenAI Chat Model node
- Activate — runs every Monday at 7 AM
Requirements
- n8n account (cloud or self-hosted)
- A website with a sitemap.xml (most CMS platforms generate one automatically)
- OpenAI API key (uses gpt-4o-mini)
- Gmail account with OAuth2
- Google Sheets
How to customize
- Change the
staleDays threshold in Site Configuration (default: 180 days / 6 months)
- Increase the page limit above 20 in the Code node for larger sites
- Add specific URL path filters to focus on blog posts, docs, or landing pages only
- Replace Gmail with Slack for faster team notifications
- Connect to your CMS API (WordPress, Ghost, Webflow) to pull content directly instead of scraping