Back to Templates

API Schema Extractor

Last update

Last update 6 months ago

Share


This workflow automates the process of discovering and extracting APIs from various services, followed by generating custom schemas. It works in three distinct stages: research, extraction, and schema generation, with each stage tracking progress in a Google Sheet.

🙏 Jim Le deserves major kudos for helping to build this sophisticated three-stage workflow that cleverly automates API documentation processing using a smart combination of web scraping, vector search, and LLM technologies.

How it works

Stage 1 - Research:

  • Fetches pending services from a Google Sheet
  • Uses Google search to find API documentation
  • Employs Apify for web scraping to filter relevant pages
  • Stores webpage contents and metadata in Qdrant (vector database)
  • Updates progress status in Google Sheet (pending, ok, or error)

Stage 2 - Extraction:

  • Processes services that completed research successfully
  • Queries vector store to identify products and offerings
  • Further queries for relevant API documentation
  • Uses Gemini (LLM) to extract API operations
  • Records extracted operations in Google Sheet
  • Updates progress status (pending, ok, or error)

Stage 3 - Generation:

  • Takes services with successful extraction
  • Retrieves all API operations from the database
  • Combines and groups operations into a custom schema
  • Uploads final schema to Google Drive
  • Updates final status in sheet with file location

Ideal for:

  • Development teams needing to catalog multiple APIs
  • API documentation initiatives
  • Creating standardized API schema collections
  • Automating API discovery and documentation

Accounts required:

  • Google account (for Sheets and Drive access)
  • Apify account (for web scraping)
  • Qdrant database
  • Gemini API access

Set up instructions:

  • Prepare your Google Sheets document with the services information. Here's an example of a Google Sheet – you can copy it and change or remove the values under the columns. Also, make sure to update Google Sheets nodes with the correct Google Sheet ID.
  • Configure Google Sheets OAuth2 credentials, required third-party services (Apify, Qdrant) and Gemini.
  • Ensure proper permissions for Google Drive access.