Back to Templates

Parse PDF, DOCX & Images with Mistral OCR via Google Drive with Slack Alerts

Created by

Created by: Yves Tkaczyk || ytkaczyk

Yves Tkaczyk

Last update

Last update 21 hours ago

Share


Use cases

  • Monitor Google Drive folder, parsing PDF, DOCX and image file into a destination folder, ready for further processing (e.g. RAG ingestion, translation, etc.)
  • Keep processing log in Google Sheet and send Slack notifications.

How it works

  • Trigger: Watch Google Drive folder for new and updated files.
  • Create a uniquely named destination folder, copying the input file.
  • Parse the file using Mistral Document, extracting content and handling non-OCRable images separately.
  • Save the data returned by Mistral Document into the destination Google Drive folder (raw JSON file, Markdown files, and images) for further processing.

How to use

  • Google Drive and Google Sheets nodes:
    • Create Google credentials with access to Google Drive and Google Sheets. Read more about Google Credentials.
    • Update all Google Drive and Google Sheets nodes (14 nodes total) to use the credentials
  • Mistral node:
  • Slack nodes:
    • Create Slack OAuth2 credentials. Read more about Slack OAuth2 credentials
    • Update the two Slack nodes: Send Success Message and Send Error Message:
      • Set the credentials
      • Select the channel where you want to send the notifications (channels can be different for success and errors).
  • Create a Google Sheets spreadsheet following the steps in Google Sheets Configuration. Ensure the spreadsheet can be accessed as Editor by the account used by the Google Credentials above.
  • Create a directory for input files and a directory for output folders/files. Ensure the directories can be accessed by the account used by the Google Credentials.
  • Update the File Created, File Updated and Workflow Configuration node following the steps in the green Notes.

Requirements

  • Google account with Google API access
  • Mistral Cloud account access to Mistral API key.
  • Slack account with access to Slack client ID and secret ID.
  • Basic n8n knowledge: understanding of triggers, expressions, and credential management

Who’s it for

Anyone building a data pipeline ingesting files to be OCRed for further processing.

🔒 Security

All credentials are stored as n8n credentials. The only information stored in this workflow that could be considered sensitive are the Google Drive Directory and Sheet IDs. These directories and the spreadsheet should be secured according to your needs.

Need Help?

Reach out on LinkedIn or Ask in the Forum!