This workflow automates the process of scraping real estate listings from *Idealista (or similar property portals), extracting structured property data using AI, and storing the results directly into Google Sheets.
It is designed to handle paginated listing pages, collect individual property URLs, extract detailed listing information, and continuously build a structured real estate database with minimal manual effort.
Automatically navigates through multiple listing pages, extracts property URLs, and retrieves detailed property information without manual browsing.
Uses ScrapeGraphAI to intelligently extract structured information such as:
Dynamically generates paginated URLs, allowing the workflow to scrape hundreds or thousands of listings efficiently.
Automatically writes and updates extracted property data into Google Sheets, creating a centralized and continuously updated real estate database.
Uses the property URL as a unique identifier to append or update listings without creating duplicates.
The workflow can be adapted to:
Ensures consistent and reliable data formatting, making the output ready for:
Built entirely inside n8n with reusable modules, making maintenance and future upgrades simple.
This workflow automates the extraction of real estate listings from Idealista by performing two main phases: listing URL discovery and detailed data extraction.
Trigger and Pagination Setup
A Manual Trigger starts the workflow. A Set node defines the base search URL and the maximum number of pages to scrape. A Code node then generates the paginated URLs (e.g., .../lista-1.htm, .../lista-2.htm).
Extract Listing URLs from Search Pages
The generated URLs are split into batches using a Split In Batches node. For each search page, a ScrapegraphAI node extracts all individual property URLs that match the pattern https://www.idealista.it/immobile/xxxx. The results are then aggregated and unified using an Aggregate and a Code node to remove duplicates and flatten the list.
Process Each Property URL
The unified list of property URLs is split again into batches. For each property URL, a second ScrapegraphAI node extracts detailed information following a strict JSON schema (including title, description, price, area, bedrooms, bathrooms, floor, rooms, balcony, terrace, cellar, heating, air conditioning, and image URLs).
Store Data in Google Sheets
The extracted data is finally written to a Google Sheet using the Google Sheets node configured with appendOrUpdate mode, which avoids duplicates by matching the URL column.
Import and Configure Credentials
Import the workflow into n8n. Add the following credentials:
Prepare the Google Sheet
Clone this template sheet or create your own. Update the Google Sheets node with your Document ID and Sheet Name.
Configure the Search Parameters
In the Set params node, modify the url variable to target your desired search (location, filters, etc.) and set max_pages to control how many search result pages to scrape.
Adjust Extraction Logic (if needed)
outputSchema (JSON schema) to match the fields you want to extract.Enable and Execute
Activate the workflow. Click the Execute Workflow button to start scraping. The results will automatically populate the configured Google Sheet, appending new listing data without creating duplicates.
👉 Subscribe to my new YouTube channel. Here I’ll share videos and Shorts with practical tutorials and FREE templates for n8n.
Contact me for consulting and support or add me on Linkedin.