This workflow automates the process of scraping real estate property listings from websites using ScrapeGraph AI, extracting structured data, and saving it to a Google Sheet. It is designed to handle paginated listing pages and can be adapted to any real estate site that uses URL parameters for pagination.
NOTE:
This workflow has been tested with Immobiliare.it, the #1 real estate website in Italy. However, it is designed to be adaptable by modifying the pagination parameter and the listing URL pattern, you can use it with any real estate website that structures its listings with URL-based pagination.
Business Use Cases:
- Real estate market intelligence
- Lead generation for agencies
- Price trend analysis
- Property comparison dashboards
- CRM enrichment
- Competitor monitoring
Key Advantages
1. ✅ Fully Automated Lead Collection
Automatically collects real estate listings without manual browsing.
2. ✅ AI-Powered Extraction
Uses AI instead of rigid selectors:
- More resilient to website layout changes
- Handles dynamic content better
- Reduces maintenance effort
3. ✅ Structured Data Output
The defined JSON schema ensures:
- Clean database-ready data
- Standardized fields
- Easy integration with CRM or analytics tools
4. ✅ Pagination Scalability
Can easily scale:
- Increase number of pages
- Change city
- Adapt to different portals
5. ✅ Duplicate Prevention
Google Sheets uses URL matching to:
- Avoid duplicates
- Update existing records
6. ✅ Modular Architecture
The workflow is modular and reusable:
- URL generation logic is independent
- Extraction schema is customizable
- Storage layer can be replaced (CRM, database, Airtable, etc.)
7. ✅ Cost & Time Efficiency
- Eliminates manual data entry
- Saves research time
- Enables automated market monitoring
How it works
The workflow is structured in two main phases:
-
Listing URL Discovery
- The user provides a base URL, the maximum number of pages to scrape, and the pagination parameter name (e.g.,
pag for Immobiliare.it).
- A Code node generates a list of page URLs by appending the pagination parameter.
- Each page URL is processed through the ScrapegraphAI node, which extracts all individual listing URLs.
- An Information Extractor node (powered by Google Gemini) filters and validates the extracted URLs based on a defined structure.
- A Wait node introduces a delay between requests to avoid rate limiting.
- A Loop Over Items node ensures all generated page URLs are processed.
-
Data Extraction & Storage
- All collected listing URLs are aggregated and split into individual items.
- A second loop processes each listing URL through another ScrapegraphAI node, which extracts detailed property data (title, description, price, area, bedrooms, bathrooms, floor, rooms, balcony, terrace, cellar, heating, air conditioning, image URLs) based on a JSON schema.
- The extracted data is then written to a Google Sheet using the Google Sheets node, with each listing stored in a new row and deduplicated based on the listing URL.
The workflow is fully automated and can scale to handle multiple listing pages and hundreds of individual property URLs.
Set up steps
To use this workflow, follow these steps:
-
Import the workflow into your n8n instance.
-
Configure credentials:
- ScrapegraphAI: Add your API key for ScrapegraphAI.
- Google Gemini (PaLM): Add your Google Gemini API credentials.
- Google Sheets OAuth2: Authenticate with the Google account where you want to store the data.
-
Prepare your target Google Sheet:
- Create a new Google Sheet (or clone this template).
- Note the Sheet ID (from the URL) and the sheet name (tab name) where data should be written.
-
Customize the input parameters:
- In the Set params node, define:
url: The base URL of the listing page (without pagination parameters).
max_pages: The number of pages to scrape.
page_format_value: The query parameter used for pagination (e.g., pag for Immobiliare.it).
-
Adjust the listing URL structure (if needed):
- In the Extract individual URL node, update the system prompt to match the URL pattern of the target website (e.g.,
https://www.xxx.it/xxx/xxxx).
-
Review the output schema:
- In the Extract data node, you can modify the JSON schema to match the fields you want to extract from each listing.
-
Update the Google Sheet node:
- Set the correct Document ID and Sheet Name in the Update real estate listings node.
- Ensure the column mapping matches your sheet structure.
-
Activate the workflow and click Execute Workflow to start scraping.
👉 Subscribe to my new YouTube channel. Here I’ll share videos and Shorts with practical tutorials and FREE templates for n8n.

Need help customizing?
Contact me for consulting and support or add me on Linkedin.