This template implements a recursive web crawler inside n8n. Starting from a given URL, it crawls linked pages up to a maximum depth (default: 3), extracts text and links, and returns the collected content via webhook.
Webhook Trigger
Accepts a JSON body with a url
field.
Example payload:
{ "url": "https://example.com" }
Initialization
url
, domain
, maxDepth = 3
, and depth = 0
.pending
, visited
, queued
, pages
).Recursive Crawling
Depth Control & Queue
maxDepth
to prevent infinite loopsData Collection
url
, depth
, content
) into pages[]
pending = 0
, combines resultsOutput
combinedContent
(all pages concatenated)pages[]
(array of individual results)Import Template
Load from n8n Community Templates.
Configure Webhook
Run a Test
Send a POST with JSON:
curl -X POST https://<your-n8n>/webhook/<id>
-H "Content-Type: application/json"
-d '{"url": "https://example.com"}'
View Response
The crawler returns a JSON object containing combinedContent
and pages[]
.
maxDepth
Default: 3. Adjust in the Init Crawl Params (Set) node.
Timeouts
HTTP Request node timeout is 5 seconds per request; increase if needed.
Filtering Rules
www
treated as same-site)mailto:
, tel:
, javascript:
~10 minutes (import → set webhook → test request)