🎯 Problem Statement
Traditional market research for Customer Intelligence (CI) is manual, slow, and often relies on surface-level social media scraping or expensive external reports. Service companies, like HVAC providers, struggle to efficiently synthesize vast volumes of online feedback (Reddit discussions, real-time tweets, web articles) to accurately diagnose systemic service gaps (e.g., scheduling friction, poor automated systems). This inefficiency leads to delayed strategic responses and missed opportunities to invest in high-impact solutions like AI voice agents.
✨ Solution
This workflow deploys a sophisticated Multisource Intelligence Pipeline that runs on a scheduled or ad-hoc basis. It uses parallel processing to ingest data from three distinct source types (SERP API, Reddit, and X/Twitter), employs a zero-cost Hybrid Categorization method to semantically identify operational bottlenecks, and uses the Anthropic LLM to synthesize the findings into a clear, executive-ready strategic brief. The data is logged for historical analysis while the brief is dispatched for immediate action.
⚙️ How It Works (Multi-Step Execution)
1. Ingestion and Parallel Processing (The Data Fabric)
-
Trigger: The workflow is initiated either on an ad-hoc basis via an n8n Form Trigger or on a schedule (Time Trigger).
-
Parallel Ingestion: The workflow immediately splits into three parallel branches to fetch data simultaneously:
- SERP API: Captures authoritative content and industry commentary (Strategic Context).
- Reddit (Looping Structure): Fetches posts from multiple subreddits via an Aggregate Node workaround to get authentic user experiences (Qualitative Signal).
- X/Twitter (HTTP Request): Bypasses standard rate limits to capture real-time social complaints (Sentiment Signal).
2. Analysis and Fusion (The Intelligence Layer)
- Cleanup and Labeling (Function Nodes): Each branch uses dedicated Function Nodes to filter noise (e.g., low-score posts) and normalize the data by adding a source tag (e.g., 'Reddit').
- Merge: A Merge Node (Append Mode) fuses all three parallel streams into a single, unified dataset.
- Hybrid Categorization (Function Node): A single Function Node applies the Hybrid Categorization Logic. This cost-free step semantically assigns a
pain_point category (e.g., 'Call Hold/Availability') and a sentiment_score to every item, transforming raw text into labeled metrics.
3. Dispatch and Reporting (The Executive Output)
- Aggregation and Split (Function Node): The final Function Node calculates the total counts, deduplicates the final results, and generates the comprehensive
summaryString.
- Data Logging: The aggregated counts and metrics are appended to Google Sheets for historical logging.
- LLM Input Retrieval (Function Node): A final Function Node retrieves the summary data using the
$items() helper (the serial route workaround).
- AI Briefing: The Message a model (Anthropic) Node receives the
summaryString and uses a strict HTML System Prompt to synthesize the strategic brief, identifying the top pain points and suggesting AI features.
- Delivery: The Gmail Node sends the final, professional HTML brief to the executive team.
🛠️ Setup Steps
Credentials
- Anthropic: Configure credentials for the Language Model (Claude) used in the Message a model node.
- SERP API, Reddit, and X/Twitter: Configure API keys/credentials for the data ingestion nodes.
- Google Services: Set up OAuth2 credentials for Google Sheets (for logging data) and Gmail (for email dispatch).
Configuration
- Form Configuration: If using the Form Trigger, ensure the Target Keywords and Target Subreddits are mapped correctly to the ingestion nodes.
- Data Integrity: Due to the serial route, ensure the Function (Get LLM Summary) node is correctly retrieving the
LLM_SUMMARY_HOLDER field from the preceding node's output memory.
✅ Benefits
- Proactive CI & Strategy: Shifts market research from manual, reactive browsing to proactive, scheduled data diagnostic.
- Cost Efficiency: Utilizes a zero-cost Hybrid Categorization method (Function Node) for intent analysis, avoiding expensive per-item LLM token costs.
- Actionable Output: Delivers a fully synthesized, HTML-formatted executive brief, ready for immediate presentation and strategic sales positioning.
- High Reliability: Employs parallel ingestion, API workarounds, and serial routing to ensure the complex workflow runs consistently and without failure.