This automated n8n workflow scrapes job listings from Upwork using Apify, processes and cleans the data, and generates daily email reports with job summaries. The system uses Google Sheets for data storage and keyword management, providing a comprehensive solution for tracking relevant job opportunities and market trends.
What is Apify?
Apify is a web scraping and automation platform that provides reliable APIs for extracting data from websites like Upwork. It handles the complexities of web scraping including rate limiting, proxy management, and data extraction while maintaining compliance with website terms of service.
Good to Know
- Apify API calls may incur costs based on usage; check Apify pricing for details
- Google Sheets access must be properly authorized to avoid data sync issues
- The workflow includes data cleaning and deduplication to ensure high-quality results
- Email reports provide structured summaries for easy review and decision-making
- Keyword management through Google Sheets allows for flexible job targeting
How It Works
The workflow is organized into three main phases:
Phase 1: Job Scraping & Initial Processing
This phase handles the core data collection and initial storage:
- Trigger Manual Run - Manually starts the workflow for on-demand job scraping
- Fetch Keywords from Google Sheet - Reads the list of job-related keywords from the All Keywords sheet
- Loop Through Keywords - Iterates over each keyword to trigger Apify scraping
- Trigger Apify Scraper - Sends HTTP request to start Apify actor for job scraping
- Wait for Apify Completion - Waits for the Apify actor to finish execution
- Delay Before Dataset Read - Waits a few seconds to ensure dataset is ready for processing
- Fetch Scraped Job Dataset - Fetches the latest dataset from Apify
- Process Raw Job Data - Filters jobs posted in the last 24 hours and formats the data
- Save Jobs to Daily Sheet - Appends new job data to the daily Google Sheet
- Update Keyword Job Count - Updates job count in the All Keywords summary sheet
Phase 2: Data Cleaning & Deduplication
This phase ensures data quality and removes duplicates:
- Load Today's Daily Jobs - Loads all jobs added in today's sheet for processing
- Remove Duplicates by Title/Desc - Removes duplicates based on title and description matching
- Save Clean Job Data - Saves the cleaned, unique entries back to the sheet
- Clear Old Daily Sheet Data - Deletes old or duplicate entries from the sheet
- Reload Clean Job Data - Loads clean data again after deletion for final processing
Phase 3: Daily Summary & Email Report
This phase generates summaries and delivers the final report:
- Generate Keyword Summary Stats - Counts job totals per keyword for analysis
- Update Summary Sheet - Updates the summary sheet with keyword statistics
- Fetch Final Summary Data - Reads the summary sheet for reporting purposes
- Build Email Body - Formats email with statistics and sheet link
- Send Daily Report Email - Sends the structured daily summary email to recipients
Data Sources
The workflow utilizes Google Sheets for data management:
-
AI Keywords Sheet - Contains keyword management data with columns:
- Keyword (text) - Job search terms
- Job Count (number) - Number of jobs found for each keyword
- Status (text) - Active/Inactive status
- Last Updated (timestamp) - When keyword was last processed
-
Daily Jobs Sheet - Contains scraped job data with columns:
- Job Title (text) - Title of the job posting
- Description (text) - Job description content
- Budget (text) - Job budget or hourly rate
- Client Rating (number) - Client's rating on Upwork
- Posted Date (timestamp) - When job was posted
- Job URL (text) - Direct link to the job posting
- Keyword (text) - Which keyword found this job
- Scraped At (timestamp) - When data was collected
-
Summary Sheet - Contains daily statistics with columns:
- Date (date) - Report date
- Total Jobs (number) - Total jobs found
- Keywords Processed (number) - Number of keywords searched
- Top Keyword (text) - Most productive keyword
- Average Budget (currency) - Average job budget
- Report Generated (timestamp) - When summary was created
How to Use
- Import the workflow into n8n
- Configure Apify API credentials and Google Sheets API access
- Set up email credentials for daily report delivery
- Create three Google Sheets with the specified column structures
- Add relevant job keywords to the AI Keywords sheet
- Test with sample keywords and adjust as needed
Requirements
- Apify API credentials and actor access
- Google Sheets API access
- Email service credentials (Gmail, SMTP, etc.)
- Upwork job search keywords for targeting
Customizing This Workflow
Modify the Process Raw Job Data node to filter jobs by additional criteria like budget range, client rating, or job type. Adjust the email report format to include more detailed statistics or add visual aids, such as charts. Customize the data cleaning logic to better handle duplicate detection based on your specific requirements, or add additional data sources beyond Upwork for comprehensive job market analysis.