Scrape real estate listings with ScrapeGraph AI and Google Sheets

Created by

Davide

Last update

Last update 9 hours ago

Automatically collects real estate listings without manual browsing.

Uses AI instead of rigid selectors:

The defined JSON schema ensures:

Can easily scale:

Google Sheets uses URL matching to:

The workflow is modular and reusable:

The workflow is structured in two main phases:

Listing URL Discovery
- The user provides a base URL, the maximum number of pages to scrape, and the pagination parameter name (e.g., pag for Immobiliare.it).
- A Code node generates a list of page URLs by appending the pagination parameter.
- Each page URL is processed through the ScrapegraphAI node, which extracts all individual listing URLs.
- An Information Extractor node (powered by Google Gemini) filters and validates the extracted URLs based on a defined structure.
- A Wait node introduces a delay between requests to avoid rate limiting.
- A Loop Over Items node ensures all generated page URLs are processed.
Data Extraction & Storage
- All collected listing URLs are aggregated and split into individual items.
- A second loop processes each listing URL through another ScrapegraphAI node, which extracts detailed property data (title, description, price, area, bedrooms, bathrooms, floor, rooms, balcony, terrace, cellar, heating, air conditioning, image URLs) based on a JSON schema.
- The extracted data is then written to a Google Sheet using the Google Sheets node, with each listing stored in a new row and deduplicated based on the listing URL.

The workflow is fully automated and can scale to handle multiple listing pages and hundreds of individual property URLs.

To use this workflow, follow these steps:

Import the workflow into your n8n instance.
Configure credentials:
- ScrapegraphAI: Add your API key for ScrapegraphAI.
- Google Gemini (PaLM): Add your Google Gemini API credentials.
- Google Sheets OAuth2: Authenticate with the Google account where you want to store the data.
Prepare your target Google Sheet:
- Create a new Google Sheet (or clone this template).
- Note the Sheet ID (from the URL) and the sheet name (tab name) where data should be written.
Customize the input parameters:
- In the Set params node, define:
  - url: The base URL of the listing page (without pagination parameters).
  - max_pages: The number of pages to scrape.
  - page_format_value: The query parameter used for pagination (e.g., pag for Immobiliare.it).
Adjust the listing URL structure (if needed):
- In the Extract individual URL node, update the system prompt to match the URL pattern of the target website (e.g., https://www.xxx.it/xxx/xxxx).
Review the output schema:
- In the Extract data node, you can modify the JSON schema to match the fields you want to extract from each listing.
Update the Google Sheet node:
- Set the correct Document ID and Sheet Name in the Update real estate listings node.
- Ensure the column mapping matches your sheet structure.
Activate the workflow and click Execute Workflow to start scraping.

👉 Subscribe to my new YouTube channel. Here I’ll share videos and Shorts with practical tutorials and FREE templates for n8n.

Contact me for consulting and support or add me on Linkedin.

There’s nothing you can’t automate with n8n