This n8n workflow automates the complete lead generation process by scraping job postings from Indeed, enriching company data via Apollo.io, identifying decision-makers, and generating personalized LinkedIn outreach messages using OpenAI. It integrates with Scrape.do for reliable web scraping, Apollo.io for B2B data enrichment, OpenAI for AI-powered personalization, and Google Sheets for centralized data storage.
Perfect for: Sales teams, recruiters, business development professionals, and marketing agencies looking to automate their outbound prospecting pipeline.
| Property | Value |
|---|---|
| Type | Schedule Trigger |
| Purpose | Automatically initiates workflow on a recurring schedule |
| Frequency | Weekly (Every Monday) |
| Time | 00:00 UTC |
Function: Ensures consistent, hands-off lead generation by running the pipeline automatically without manual intervention.
| Property | Value |
|---|---|
| Type | HTTP Request (GET) |
| Purpose | Scrapes job listings from Indeed via Scrape.do proxy API |
| Endpoint | https://api.scrape.do |
| Output Format | Markdown |
Request Parameters:
| Parameter | Value | Description |
|---|---|---|
| token | API Token | Scrape.do authentication |
| url | Indeed Search URL | Target job search page |
| super | true | Uses residential proxies |
| geoCode | us | US-based content |
| render | true | JavaScript rendering enabled |
| device | mobile | Mobile viewport for cleaner HTML |
| output | markdown | Lightweight text output |
Function: Fetches Indeed job listings with anti-bot bypass, returning clean markdown for easy parsing.
| Property | Value |
|---|---|
| Type | Code Node (JavaScript) |
| Purpose | Extracts structured job data from markdown |
| Mode | Run once for all items |
Extracted Fields:
| Field | Description | Example |
|---|---|---|
| jobTitle | Position title | "Senior Data Engineer" |
| jobUrl | Indeed job link | "https://indeed.com/viewjob?jk=abc123" |
| jobId | Indeed job identifier | "abc123" |
| companyName | Hiring company | "Acme Corporation" |
| location | City, State | "San Francisco, CA" |
| salary | Pay range | "$120,000 - $150,000" |
| jobType | Employment type | "Full-time" |
| source | Data source | "Indeed" |
| dateFound | Scrape date | "2025-01-15" |
Function: Parses markdown using regex patterns, filters invalid entries, and deduplicates by company name.
| Property | Value |
|---|---|
| Type | Google Sheets Node |
| Purpose | Stores parsed job postings for tracking |
| Operation | Append rows |
| Target Sheet | "Add New Company" |
Function: Creates a historical record of all discovered job postings and companies for pipeline tracking.
| Property | Value |
|---|---|
| Type | HTTP Request (POST) |
| Purpose | Enriches company data via Apollo.io API |
| Endpoint | https://api.apollo.io/v1/organizations/search |
| Authentication | HTTP Header Auth (x-api-key) |
Request Body:
{
"q_organization_name": "Company Name",
"page": 1,
"per_page": 1
}
Response Fields:
| Field | Description |
|---|---|
| id | Apollo organization ID |
| name | Official company name |
| website_url | Company website |
| linkedin_url | LinkedIn company page |
| industry | Business sector |
| estimated_num_employees | Company size |
| founded_year | Year established |
| city, state, country | Location details |
| short_description | Company overview |
Function: Retrieves comprehensive company intelligence including LinkedIn profiles, industry classification, and employee count.
| Property | Value |
|---|---|
| Type | Code Node (JavaScript) |
| Purpose | Parses Apollo response and merges with original data |
| Mode | Run once for each item |
Function: Extracts relevant fields from Apollo API response and combines with job posting data for downstream processing.
| Property | Value |
|---|---|
| Type | HTTP Request (POST) |
| Purpose | Finds decision-makers at target companies |
| Endpoint | https://api.apollo.io/v1/mixed_people/search |
| Authentication | HTTP Header Auth (x-api-key) |
Request Body:
{
"organization_ids": ["apollo_org_id"],
"person_titles": [
"CTO",
"Chief Technology Officer",
"VP Engineering",
"Head of Engineering",
"Engineering Manager",
"Technical Director",
"CEO",
"Founder"
],
"page": 1,
"per_page": 3
}
Response Fields:
| Field | Description |
|---|---|
| first_name | Contact first name |
| last_name | Contact last name |
| title | Job title |
| Email address | |
| linkedin_url | LinkedIn profile URL |
| phone_number | Direct phone |
Function: Identifies key stakeholders and decision-makers based on configurable title filters.
| Property | Value |
|---|---|
| Type | Code Node (JavaScript) |
| Purpose | Structures lead data for outreach |
| Mode | Run once for all items |
Function: Combines person data with company context, creating comprehensive lead profiles ready for personalization.
| Property | Value |
|---|---|
| Type | OpenAI Node |
| Purpose | Creates custom LinkedIn connection messages |
| Model | gpt-4o-mini |
| Max Tokens | 150 |
| Temperature | 0.7 |
System Prompt:
You are a professional outreach specialist. Write personalized LinkedIn connection request messages. Keep messages under 300 characters. Be friendly, professional, and mention a specific reason for connecting based on their role and company.
User Prompt Variables:
| Variable | Source |
|---|---|
| Name | $json.fullName |
| Title | $json.title |
| Company | $json.companyName |
| Industry | $json.industry |
| Job Context | $json.jobTitle |
Function: Generates unique, contextual outreach messages that reference specific hiring activity and company details.
| Property | Value |
|---|---|
| Type | Code Node (JavaScript) |
| Purpose | Combines lead data with generated message |
| Mode | Run once for each item |
Function: Merges OpenAI response with lead profile, creating the final enriched record.
| Property | Value |
|---|---|
| Type | Google Sheets Node |
| Purpose | Stores final lead data with personalized messages |
| Operation | Append rows |
| Target Sheet | "Leads" |
Data Mapping:
| Column | Data |
|---|---|
| First Name | Lead's first name |
| Last Name | Lead's last name |
| Title | Job title |
| Company | Company name |
| LinkedIn URL | Profile link |
| Country | Location |
| Industry | Business sector |
| Date Added | Timestamp |
| Source | "Indeed + Apollo" |
| Personalized Message | AI-generated outreach text |
Function: Creates actionable lead database ready for outreach campaigns.
⏰ Schedule Trigger
│
▼
🔍 Scrape.do Indeed API ──► Fetches job listings with JS rendering
│
▼
📋 Parse Indeed Jobs ──► Extracts company names, job details
│
▼
📊 Add New Company ──► Saves to Google Sheets (Companies)
│
▼
🏢 Apollo Org Search ──► Enriches company data
│
▼
📤 Extract Apollo Org Data ──► Parses API response
│
▼
👥 Apollo People Search ──► Finds decision-makers
│
▼
📝 Format Leads ──► Structures lead profiles
│
▼
🤖 Generate Personalized Message ──► AI creates custom outreach
│
▼
🔗 Merge Lead + Message ──► Combines all data
│
▼
💾 Save Leads to Sheet ──► Final storage (Leads)
| Credential | Purpose | Where to Get |
|---|---|---|
| Scrape.do API Token | Web scraping with anti-bot bypass | scrape.do/dashboard |
| Apollo.io API Key | B2B data enrichment | apollo.io/settings/integrations |
| OpenAI API Key | AI message generation | platform.openai.com |
| Google Sheets OAuth2 | Data storage | n8n Credentials Setup |
| Credential Type | Configuration |
|---|---|
| HTTP Header Auth (Apollo) | Header: x-api-key, Value: Your Apollo API key |
| OpenAI API | API Key: Your OpenAI API key |
| Google Sheets OAuth2 | Complete OAuth flow with Google |
| Specification | Value |
|---|---|
| Processing Time | 2-5 minutes per run (depending on job count) |
| Jobs per Run | ~25 unique companies |
| API Calls per Run | 1 Scrape.do + ~25 Apollo Org + ~25 Apollo People + ~75 OpenAI |
| Data Accuracy | 90%+ for company matching |
| Success Rate | 99%+ with proper error handling |
| Service | Free Tier Limit | Recommendation |
|---|---|---|
| Scrape.do | 1,000 credits/month | ~40 runs/month |
| Apollo.io | 100 requests/day | Add Wait nodes if needed |
| OpenAI | Based on usage | Monitor costs (~$0.01-0.05/run) |
| Google Sheets | 300 requests/minute | No issues expected |
To customize search:
Change the `url` parameter in "Scrape.do Indeed API" node:
- q=data+engineer (search term)
- l=Remote (location)
- fromage=7 (last 7 days)
x-api-keycompanyName | jobTitle | jobUrl | location | salary | source | postedDateFirst Name | Last Name | Title | Company | LinkedIn URL | Country | Industry | Date Added | Source | Personalized Message| Issue | Cause | Solution |
|---|---|---|
| "Invalid character: [" | Empty/malformed company name | Check Parse Indeed Jobs output |
| "Node does not have credentials" | Credential not linked | Open node → Select credential |
| Empty Parse Results | Indeed HTML structure changed | Check Scrape.do raw output |
| Apollo Rate Limit (429) | Too many requests | Add 5-10s Wait node between calls |
| OpenAI Timeout | Too many tokens | Reduce batch size or max_tokens |
| "Your request is invalid" | Malformed JSON body | Verify expression syntax in HTTP nodes |
For production use, consider adding:
- IF node after Apollo Org Search to handle empty results
- Error Workflow trigger for notifications
- Wait nodes between API calls for rate limiting
- Retry logic for transient failures
| Metric | Value |
|---|---|
| Execution Time | 2-5 minutes per scheduled run |
| Jobs Discovered | ~25 per Indeed page |
| Leads Generated | 1-3 per company (based on title matches) |
| Message Quality | Professional, contextual, <300 chars |
| Data Freshness | Real-time from Indeed + Apollo |
| Storage Format | Google Sheets (unlimited rows) |
| Endpoint | Method | Purpose |
|---|---|---|
https://api.scrape.do |
GET | Direct URL scraping |
Documentation: scrape.do/documentation
| Endpoint | Method | Purpose |
|---|---|---|
/v1/organizations/search |
POST | Company lookup |
/v1/mixed_people/search |
POST | People search |
Documentation: apolloio.github.io/apollo-api-docs
| Endpoint | Method | Purpose |
|---|---|---|
/v1/chat/completions |
POST | Message generation |
Documentation: [platform.openai.com