Who is this workflow for?
This workflow is designed for SEO analysts, content creators, marketing agencies, and developers who need to index a website and then interact with its content as if it were a chatbot.
⚠ Note: if the site contains many pages, AI token consumption can generate high costs, especially during the initial crawling and analysis phase.
When the user enters a URL for the first time:
URL validation using AI (gpt-5-nano).
Automatic sitemap discovery via robots.txt
.
Relevant sitemap selection (pages, posts, categories, or tags) using GPT-4o according to configured options.
(Includes “OPTIONS” node to precisely choose which types of URLs to process)
Crawling of all selected pages:
Downloads HTML of each page.
Converts HTML to Markdown.
AI analysis to extract:
Structured storage in Google Sheets:
When finished, the sheet is marked with Data schema = true
, signaling that the site is indexed.
If the URL has already been indexed (Data schema = true
):
The chat becomes a LangChain Agent that:
This allows the user to ask questions such as: