This workflow is perfect for content strategists, SEO specialists, marketing agencies, and virtual assistants who need to quickly audit and collect blog content from client websites into a structured Google Sheet without doing manual crawling and copy-pasting.
Manually visiting a website, finding blog posts, and copying content into a spreadsheet is time-consuming and prone to errors. This workflow automates the process: it crawls a website, filters only blog-related pages, scrapes the article content, and stores everything neatly in Google Sheets for easy analysis and content strategy planning.
The workflow starts when a client submits their website URL through a form. A Google Sheet is automatically created and headers are added for organizing the audit. Dumpling AI then crawls the website to discover all available pages, while the automation filters out only blog-related URLs. Each blog page is scraped for content, and the structured results (URL, crawled page, and website content) are appended row by row into the Google Sheet.
Form Trigger – Form Submission (Client URL)
Captures the client’s website URL to start the workflow.
Google Sheets – Create Blog Audit Sheet
Creates a new Google Sheet with a title based on the submitted URL.
Set – Set Sheet Headers
Defines the headers: Url
, Crawled_pages
, website_content
.
Code – Format Header Row
Formats the headers properly before sending them to the sheet.
HTTP Request – Insert Headers into Sheet
Updates the Google Sheet with the prepared header row.
HTTP Request – Dumpling AI: Crawl Website
Crawls the submitted URL to discover internal pages.
Code – Extract Blog URLs
Filters the crawl results and keeps only URLs that match common blog patterns (e.g., /blog/
, /articles/
, /posts/
).
HTTP Request – Dumpling AI: Scrape Blog Pages
Scrapes the text content from each filtered blog page.
Set – Prepare Row Data
Maps the URL, blog page link, and scraped content into structured fields.
Google Sheets – Save Blog Data to Google Sheets
Appends the structured data into the audit sheet row by row.
Dumpling AI: Crawl Website
node.Extract Blog URLs
node uses regex patterns to detect blog content. You can customize these patterns to match your website’s URL structure.