Triage GitHub issues with OpenAI categorization and embedding-based duplicate detection

Created by

Last update

Last update 3 months ago

Who's it for

Open-source maintainers, product teams with public repositories, and any organization receiving a steady stream of GitHub Issues. Ideal for small teams who waste hours per week triaging duplicates and misrouted reports.

How it works

When a new Issue is opened, a GitHub webhook fires this workflow. It first filters for the "opened" action, then fetches the last 30 Issues from the repository. All Issue texts (new + past) are sent to OpenAI's embeddings API in a single batch call for efficiency. The workflow calculates cosine similarity between the new Issue and every past Issue. If the maximum similarity exceeds 0.85, the new Issue is auto-closed with a comment referencing the original. Otherwise, AI classifies it into one of four categories: bug (adds label + Slack alert to dev team), question (posts FAQ link as a comment), feature (appends to a roadmap Google Sheet), or spam (auto-close with label). AI is used only for classification — the duplicate detection uses deterministic vector math, and every action is rule-based.

Set up steps

Generate a GitHub Personal Access Token with repo scope
Create a webhook on your repository pointing to this workflow's URL, subscribing to Issues events
Create a Google Sheet named feature_roadmap with columns: date_added, issue_number, title, author, url, status
Open Set Configuration and fill in the repo owner, repo name, Sheet ID, Slack channel, and FAQ URL
Register GitHub and OpenAI Header Auth credentials and connect Google Sheets and Slack
Activate the workflow

How to customize

Adjust duplicate_threshold for stricter or looser matching, change the embeddings model, or swap Sheets for Notion or Airtable.