Amazon Product Search Scraper with BrightData, GPT-4, and Google Sheets

Created by

phil

Last update

Last update 3 months ago

Who Needs Amazon Search Result Scraping?

This scraping workflow is ideal for teams and businesses that need to monitor Amazon product listings at scale:

E-commerce Analysts – Track competitor pricing, ratings, and inventory trends.
Market Researchers – Collect data on product popularity and reviews for market analysis.
Data Teams – Automate ingestion of product metadata into BI pipelines or data lakes.
Affiliate Marketers – Keep affiliate catalogs up to date with latest product details and prices.

If you need reliable, structured data from Amazon search results delivered directly into your spreadsheets, this workflow saves you hours of manual copy-and-paste.

Why Use This Workflow?

End-to-End Automation – From URL list to clean JSON output in Sheets.
Robust HTML Cleaning – Strips scripts, styles, unwanted tags, and noise.
Accurate Structured Parsing – Leverages GPT-4 via LangChain for reliable extraction.
Scalable & Repeatable – Processes thousands of URLs in batches.

Step-by-Step: How This Workflow Scrapes Amazon

Get URLs from Google Sheets – Reads a list of search result URLs.
Loop Over Items – Iterates through each URL in controlled batches.
Fetch Raw HTML – Uses BrightData’s Web Unlocker proxy to retrieve the page.
Clean HTML – A Function node removes doctype, scripts, styles, head, comments, classes, and non-whitelisted tags, collapsing extra whitespace.
Extract with LLM – Passes cleaned HTML into LangChain → GPT-4 to output JSON for each product:
- name, description, rating, reviews, price
Save Results – Appends the JSON fields as columns back into a “results” sheet in Google Sheets.

Customization: Tailor to Your Needs

Adaptable Sites – This workflow can be adapted to any e-commerce or other website, for example Walmart or eBay.
Whitelist Tags – Modify the allowedTags array in the Code node to keep additional HTML elements.
Schema Changes – Update the Structured Output Parser schema to include more fields (e.g., availability, SKU).
Alternate Data Sink – Instead of Sheets, route output to a database, CSV file, or webhook.

🔑 Prerequisites

Google Sheets Credentials – OAuth credentials configured in n8n.
BrightData API token – Stored in n8n credentials as BRIGHTDATA_TOKEN.
OpenRouter API Key – Configured for the LangChain node to call GPT-4.
n8n Instance – Self-hosted or cloud with sufficient quota for HTTP requests and LLM calls.

🚀 Installation & Setup

Configure Credentials
1. In n8n, set up Google Sheets OAuth under “Credentials.”
2. Add BrightData token as a new HTTP Request credential.
3. Create an OpenRouter API key credential for the LangChain node.
Import the Workflow
1. Copy the JSON workflow into n8n’s “Import” dialog.
2. Map your Google Sheet IDs and GIDs to the {{WEB_SHEET_ID}}, {{TRACK_SHEET_GID}}, and {{RESULTS_SHEET_GID}} placeholders.
3. Ensure the BRIGHTDATA_TOKEN credential is selected on the HTTP Request node.
Test & Run
1. Add a few Amazon search URLs to your “track” sheet.
2. Execute the workflow and verify product data appears in your “results” sheet.
3. Tweak batch size or parser schema as needed.

⚠ Important

API Rate Limits – Monitor your BrightData and OpenRouter usage to avoid throttling.
Amazon’s Terms – Ensure your scraping complies with Amazon’s policies and legal requirements.

Summary

This workflow delivers a fully automated, scalable solution to extract structured product data from Amazon search pages directly into Google Sheets—streamlining your competitive analysis and data collection. 🚀

Phil | Inforeole