Back to Templates

🌐 Firecrawl Website Content Extractor

Created by

Created by: Aashit Sharma || aashitsharma

Aashit Sharma

Last update

Last update 24 days ago

Share


🌐 Firecrawl Website Content Extractor (n8n Workflow)

This n8n automation workflow uses Firecrawl API to extract structured data (e.g., quotes and authors) from web pages — such as Quotes to Scrape — and handles retries in case of delayed extraction.


🔁 Workflow Overview

🎯 Purpose:

  • Crawl and extract structured web data using Firecrawl
  • Wait for asynchronous scraping to complete
  • Retrieve and validate results
  • Support retries if content is not ready

🔧 Step-by-Step Node Breakdown

1. 🧪 Manual Trigger

  • Node: When clicking ‘Test workflow’
  • Used to manually test or execute the workflow during setup or debugging.

2. 📤 Firecrawl Extract API Request

  • Node: Extract
  • Sends a POST request to https://api.firecrawl.dev/v1/extract
  • Payload includes:
    • urls: List of pages to crawl (https://quotes.toscrape.com/*)
    • prompt: "Extract all quotes and their corresponding authors from the website."
    • schema: JSON schema defining expected structure (quotes[], each with text and author)

📌 Uses an HTTP Header Auth credential for Firecrawl API


3. ⏱️ Wait for 30 Seconds

  • Node: 30 Secs
  • Gives Firecrawl time to finish processing in the background
  • Prevents hitting the API before results are ready

4. 📥 Get Results

  • Node: Get Results
  • Performs a GET request to the status URL using {{ $('Extract').item.json.id }} to retrieve extraction results.

5. ✅❌ Condition Check

  • Node: If
  • Checks if the data array is empty (i.e., no results yet)
  • If data is empty:
    • Waits 10 more seconds and retries
  • If data is available:
    • Passes data to the next step (e.g., processing or storage)

6. 🔁 Retry Delay

  • Node: 10 Seconds
  • Waits briefly before sending another GET request to Firecrawl

7. 🛠️ Edit Fields (Optional Output Formatting)

  • Node: Edit Fields
  • Placeholder to structure or format the extracted results (quotes and authors)

🧾 Sticky Note: Firecrawl Setup Guide

Included as an embedded reference:

  • 🔗 10% Firecrawl Discount
  • 🧰 Instructions to:
    • Add Firecrawl API credentials in n8n
    • Use Firecrawl Community Node for self-hosted instances
    • Set up the schema and prompt for targeted data extraction

✅ Key Features

  • 🔌 API-based crawling with schema-structured output
  • ⏱️ Smart waiting + retry mechanism
  • 🧠 AI prompt integration for intelligent data parsing
  • ⚙️ Flexible for different URLs, prompts, and schemas

📦 Sample Output Schema

{
  "quotes": [
    {
      "text": "The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.",
      "author": "Albert Einstein"
    },
    {
      "text": "It is our choices, Harry, that show what we truly are, far more than our abilities.",
      "author": "J.K. Rowling"
    }
  ]
}