Scrape URLs from Google Sheets and save as Markdown to Google Drive

Created by

Last update

Last update a day ago

Who it's for

This workflow is for content teams, SEO specialists, and developers who need to batch-scrape web pages and store their content as clean Markdown files — without manually copying and reformatting text from each URL.

How it works

A chat message triggers the workflow. The URL in the chat input points to a Google Sheet containing the list of URLs to scrape.
All rows are read from the sheet, then filtered to keep only rows with a non-empty URL and no existing scraping status.
Valid URLs are processed in configurable batches using a loop node.
Each URL is scraped with Firecrawl, which extracts the full web page content.
A code node cleans the raw Markdown: it removes links, standalone URLs, navigation labels, and excess whitespace.
The cleaned Markdown file is saved to a specified Google Drive folder.
The corresponding row in Google Sheets is updated with an "OK" status to prevent re-scraping on future runs.

How to set up

[ ] Connect your Google Sheets OAuth2 credential and configure the sheet containing the URLs
[ ] Connect your Firecrawl API credential for web scraping
[ ] Connect your Google Drive OAuth2 credential and set the destination folder ID
[ ] Ensure the Google Sheet has a URL column and a Scraped column
[ ] (Optional) Adjust the batch size in the loop node to control throughput

Requirements

Google Sheets account
Google Drive account
Firecrawl API key (community node — self-hosted n8n only)

How to customize

Replace the chat trigger with a Schedule Trigger for fully automated periodic scraping.
Modify the Transform Scraped Content code node to enrich the Markdown (e.g. add metadata headers, extract specific sections).
Add a Slack or email notification step after the loop completes to report how many URLs were successfully scraped.