Back to Templates

Scrape, search and browse the web with a Firecrawl AI agent webhook

Created by

Created by: Firecrawl || firecrawl
Firecrawl

Last update

Last update a day ago

Share


Turn any prompt into structured web data. Send a POST request with a natural language prompt and an optional JSON schema, and get back clean, structured results scraped from the web by an AI agent powered by Firecrawl.

Use Cases

  • Data Enrichment: Feed company names or URLs from your CRM and get back structured firmographic data (industry, funding, team size, tech stack).
  • Lead Generation: Ask the agent to find pricing, contact pages, or product details for a list of competitors.
  • Market Research: Extract structured pricing plans, feature comparisons, or product catalogs from any website.
  • Content Aggregation: Pull structured news, events, or job postings from across the web on a schedule.
  • Sales Intelligence: Enrich prospect lists with company info, recent news, or tech stack details before outreach.

How It Works

POST /webhook/scrape-agent
  1. Receive Scrape Request receives a POST request with prompt and an optional output_schema.

  2. Validate Output Schema checks the schema. If none is provided, it falls back to a permissive default. If the schema is malformed, it returns a clear error via Return Schema Error.

  3. Research & Extract Web Data takes the prompt and uses the full Firecrawl toolkit to research the web:

    • Search (/search): Finds relevant pages and sources across the web.
    • Scrape (/scrape): Extracts clean, structured content from any URL.
    • Browser (context, create, execute, list, delete): Gives the agent full browser navigation powers to interact with JavaScript-heavy pages, click through elements, and extract data that static scraping cannot reach.

    This combination gives the AI agent complete web navigation capabilities. It can discover sources, read pages, and interact with dynamic content autonomously.

  4. Format Response to Schema (Structured Output Parser) formats the agent's response to match the provided (or default) schema.

  5. Return Structured Results sends the structured JSON back to the caller.

Setup Requirements

  • Firecrawl API Key: Sign up at firecrawl.dev and grab your API key. Connect it in the Firecrawl credential nodes.
  • LLM Provider: Configure your Primary Chat Model and Fallback Chat Model nodes (e.g., OpenRouter, OpenAI, Anthropic). The template uses two model nodes for reliability, plus a separate Parser Chat Model for the output parser.
  • n8n Instance: Self-hosted or cloud. Make sure the webhook node is set to accept POST requests.

API Reference

Endpoint

POST https://your-n8n-instance/webhook/scrape-agent

Request Body

Field Type Required Description
prompt string Yes Natural language instruction for the agent
output_schema object No JSON Schema defining the desired output structure

Response

Returns a JSON object matching the provided schema, or a flexible object if no schema was given.


Testing Examples

1. Basic Request (No Schema)

The agent decides the output structure on its own.

curl -X POST "https://your-n8n-instance/webhook/scrape-agent" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Find the latest pricing for Firecrawl"
  }' | jq

Expected output: A JSON object with whatever structure the agent finds most appropriate for the data. Since no schema was provided, the internal default ({ "type": "object", "additionalProperties": true }) is used.

2. Request With a Custom Schema

You define exactly the shape of data you want back.

curl -X POST "https://your-n8n-instance/webhook/scrape-agent" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Find the latest pricing for Firecrawl",
    "output_schema": {
      "type": "object",
      "properties": {
        "source": { "type": "string" },
        "plans": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "name": { "type": "string" },
              "price": { "type": "string" },
              "credits": { "type": "string" },
              "highlights": {
                "type": "array",
                "items": { "type": "string" }
              }
            }
          }
        }
      }
    }
  }' | jq

Expected output:

{
  "output": {
    "source": "https://www.firecrawl.dev/pricing",
    "plans": [
      {
        "name": "Free",
        "price": "$0 (one-time)",
        "credits": "500 credits (one-time)",
        "highlights": [
          "Scrape up to 500 pages",
          "2 concurrent requests",
          "Low rate limits",
          "No credit card required"
        ]
      },
      {
        "name": "Hobby",
        "price": "$16/month (billed yearly, save $38)",
        "credits": "3,000 credits / month",
        "highlights": [
          "Scrape up to 3,000 pages",
          "5 concurrent requests",
          "Basic support",
          "$9 per extra 1k credits"
        ]
      }
    ]
  }
}

3. Invalid Schema (String Instead of Object)

curl -X POST "https://your-n8n-instance/webhook/scrape-agent" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Find the latest pricing for Firecrawl",
    "output_schema": "not a valid schema"
  }' | jq

Expected output:

{
  "error": true,
  "message": "Invalid output_schema: must be a JSON object with a valid 'type' property (object, array, string, number, boolean)",
  "example_schema": {
    "type": "object",
    "properties": {
      "name": { "type": "string" },
      "price": { "type": "number" }
    }
  }
}

4. Invalid Schema (Array Instead of Object)

curl -X POST "https://your-n8n-instance/webhook/scrape-agent" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Find the latest pricing for Firecrawl",
    "output_schema": [1, 2, 3]
  }' | jq

Expected output: Same error response as above.

5. Invalid Schema (Missing type Property)

curl -X POST "https://your-n8n-instance/webhook/scrape-agent" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Find the latest pricing for Firecrawl",
    "output_schema": {
      "properties": {
        "name": { "type": "string" }
      }
    }
  }' | jq

Expected output: Same error response as above.

6. Invalid Schema (Invalid type Value)

curl -X POST "https://your-n8n-instance/webhook/scrape-agent" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Find the latest pricing for Firecrawl",
    "output_schema": {
      "type": "banana"
    }
  }' | jq

Expected output: Same error response as above.


Workflow Architecture

Receive Scrape Request (POST)
  |
  v
Validate Output Schema
  |--- Error --> Return Schema Error (error JSON)
  |--- Success --> Research & Extract Web Data (AI Agent)
                     |
                     |--- Primary Chat Model
                     |--- Fallback Chat Model
                     |--- Search & Scrape:
                     |      - Search the Web (/search)
                     |      - Scrape Webpage Content (/scrape)
                     |--- Browser Tool:
                     |      - Get Browser Context
                     |      - Create Browser Session
                     |      - Execute Browser Code
                     |      - List Browser Sessions
                     |      - Delete Browser Session
                     |
                     v
                   Return Structured Results
                     |
                     |--- Format Response to Schema (Output Parser)
                            |
                            |--- Parser Chat Model

Schema Validation Logic

The Validate Output Schema node runs this validation before passing data to the agent:

  • If output_schema is missing or null, the default permissive schema is used: { "type": "object", "additionalProperties": true }.
  • If output_schema is present, it must be a JSON object (not a string, array, or primitive).
  • It must have a type property with a valid value: object, array, string, number, or boolean.
  • If validation fails, the workflow returns an error response with a helpful message and example schema.

Customization Ideas

  • Add authentication: Protect the webhook with an API key header check in the Validate Output Schema node.
  • Add memory: Connect a memory node to the Research & Extract Web Data agent for multi-turn conversations.
  • Chain with other workflows: Call this webhook from other n8n workflows to enrich data in pipelines.
  • Swap LLM providers: Replace the model nodes with any supported provider (OpenAI, Anthropic, Google, etc.).
  • Log requests: Add a Google Sheets or database node after the webhook to log all queries and results.

Notes

  • The Format Response to Schema node (Structured Output Parser) requires the schema to be passed as a JSON string. The expression {{ JSON.stringify($('Validate Output Schema').item.json.output_schema) }} handles this conversion.
  • The agent has access to Firecrawl's full toolkit: search, scrape, and browser automation. With all three connected, the agent has complete web navigation powers. It can discover sources via search, extract content via scrape, and interact with dynamic JavaScript-heavy pages via the browser. It autonomously decides which tools to use based on the prompt.
  • Response times vary depending on the complexity of the prompt and how many pages the agent needs to visit. Simple lookups take a few seconds; deep research can take longer.