Quick Overview
This workflow manually runs to read a list of webpage URLs from Google Sheets, scrape each page with ScrapingBee, and use Google Gemini to extract structured product data from screenshots with an HTML fallback, then append the results back into a Google Sheets sheet.
How it works
- Runs when you manually trigger the workflow.
- Reads the list of URLs to scrape from a Google Sheets spreadsheet.
- Fetches a full-page screenshot for each URL using the ScrapingBee API.
- Sends the screenshot (and the URL for context) to a Google Gemini model to extract product details into a structured JSON format, calling a ScrapingBee HTML fetch tool when screenshot extraction is incomplete.
- Converts any fetched HTML to Markdown and returns it to the Gemini agent to complete the extraction.
- Splits the extracted product array into individual items and appends them as new rows in the Google Sheets “Results” sheet.
Setup
- Create a Google Sheets service account connection in n8n and set the target spreadsheet and the “List of URLs” and “Results” sheet selections.
- Add a Google Gemini (PaLM) API credential in n8n and ensure the selected model (gemini-1.5-pro-latest) is available for your account.
- Add your ScrapingBee API key in both ScrapingBee HTTP requests (screenshot and HTML) and confirm the target URLs are reachable from your n8n environment.
- Ensure the “Results” sheet columns match the structured output fields (for example: product_title, product_price, product_brand, promo, promo_percentage/promo_percent) or update the output schema and column mappings accordingly.