This workflow is designed for:
Automation engineers building AI-powered data pipelines
Product managers & analysts needing structured insights from web pages
Researchers & content teams extracting summaries from documentation or articles
HR, compliance, and knowledge teams converting unstructured web content into structured records
n8n self-hosted users leveraging advanced scraping and LLM enrichment
It is ideal for anyone who wants to transform any public URL into structured data + clean summaries automatically.
Web content is often unstructured, verbose, and inconsistent, making it difficult to:
Extract structured fields reliably
Generate consistent summaries
Reuse data across spreadsheets, dashboards, or databases
Eliminate manual copy-paste and interpretation
This workflow solves the problem of turning arbitrary web pages into machine-readable JSON and human-readable summaries, without custom scrapers or manual parsing logic.
The workflow integrates Decodo, Google Gemini, and Google Sheets to perform automated extraction of structured data.
Here’s how it works step-by-step:
Input Setup
url.Profile Extraction with Decodo
Accepts any valid URL as input
Scrapes the page content using Decodo
Uses Google Gemini to:
Extract structured data in JSON format
Generate a concise, factual summary
Cleans and parses AI-generated JSON safely
Merges structured data and summary output
Stores the final result in Google Sheets for reporting or downstream automation
JSON Parsing & Merging
Data Storage in Google Sheets
End Output


Import the workflow into your n8n instance.
Configure Credentials
Decodo node.Edit Input Node
Run the Workflow
You can easily extend or adapt this workflow:
Replace Google Sheets with:
Insert IF nodes to:
Replace manual trigger with:
This workflow provides a powerful, generic solution for converting unstructured web pages into structured, AI-enriched datasets.
By combining Decodo for scraping, Google Gemini for intelligence, and Google Sheets for persistence, it enables repeatable, scalable, and production-ready data extraction without custom scrapers or brittle parsing logic.