📺 Full walkthrough video: https://youtu.be/x3PDYon4qKk
⚠️ Self-hosted only — This template uses a community node (Firecrawl) and cannot run on n8n Cloud.
Who it's for
This workflow is for content teams, SEO specialists, and developers who need to batch-scrape web pages and store their content as clean Markdown files — without manually copying and reformatting text from each URL.
How it works
- A chat message triggers the workflow. The URL in the chat input points to a Google Sheet containing the list of URLs to scrape.
- All rows are read from the sheet, then filtered to keep only rows with a non-empty URL and no existing scraping status.
- Valid URLs are processed in configurable batches using a loop node.
- Each URL is scraped with Firecrawl, which extracts the full web page content.
- A code node cleans the raw Markdown: it removes links, standalone URLs, navigation labels, and excess whitespace.
- The cleaned Markdown file is saved to a specified Google Drive folder.
- The corresponding row in Google Sheets is updated with an "OK" status to prevent re-scraping on future runs.
How to set up
- [ ] Connect your Google Sheets OAuth2 credential and configure the sheet containing the URLs
- [ ] Connect your Firecrawl API credential for web scraping
- [ ] Connect your Google Drive OAuth2 credential and set the destination folder ID
- [ ] Ensure the Google Sheet has a
URL column and a Scraped column
- [ ] (Optional) Adjust the batch size in the loop node to control throughput
Requirements
- Google Sheets account
- Google Drive account
- Firecrawl API key (community node — self-hosted n8n only)
How to customize
- Replace the chat trigger with a Schedule Trigger for fully automated periodic scraping.
- Modify the Transform Scraped Content code node to enrich the Markdown (e.g. add metadata headers, extract specific sections).
- Add a Slack or email notification step after the loop completes to report how many URLs were successfully scraped.