💡🌐 Essential Multipage Website Scraper with Jina.ai

Created by

Joseph LePage

Last update

Last update 6 months ago

💡🌐 Essential Multipage Website Scraper with Jina.ai

Use responsibly and follow local rules and regulations

This N8N workflow enables automated multi-page website scraping using Jina.ai's powerful web scraping capabilities, with seamless integration to Google Drive for content storage. Here's how it works:

Main Features

The workflow automatically scrapes multiple pages from a website's sitemap and saves each page's content as a separate Google Drive document.

Key Components

Input Configuration

Starts with a sitemap URL (default: https://ai.pydantic.dev/sitemap.xml)
Processes the sitemap to extract individual page URLs
Includes filtering options to target specific topics or pages

Scraping Process

Uses Jina.ai's web scraper to extract content from each URL
Converts webpage content into clean markdown format
Extracts page titles automatically for document naming

Storage Integration

Creates individual Google Drive documents for each scraped page
Names documents using the format "URL - Page Title"
Saves content in markdown format for better readability

Usage Instructions

Set your target website's sitemap URL in the "Set Website URL" node
Configure the "Filter By Topics or Pages" node to select specific content
Adjust the "Limit" node (default: 20 pages) to control batch size
Connect your Google Drive account
Run the workflow to begin automated scraping

Additional Features

Built-in rate limiting through the Wait node to prevent overloading servers
Batch processing capability for handling large sitemaps

The workflow requires no API key for Jina.ai, making it accessible for immediate use while maintaining responsible scraping practices.

💡🌐 Essential Multipage Website Scraper with Jina.ai

💡🌐 Essential Multipage Website Scraper with Jina.ai

Main Features

Key Components

Usage Instructions

Additional Features

There’s nothing you can’t automate with n8n