Back to Integrations
integrationGoogle Drive node
integrationGoogle Gemini Chat Model node

Google Drive and Google Gemini Chat Model integration

Save yourself the work of writing custom integrations for Google Drive and Google Gemini Chat Model and use n8n instead. Build adaptable and scalable Data & Storage, AI, and Langchain workflows that work with your technology stack. All within a building experience you will love.

How to connect Google Drive and Google Gemini Chat Model

  • Step 1: Create a new workflow
  • Step 2: Add and configure nodes
  • Step 3: Connect
  • Step 4: Customize and extend your integration
  • Step 5: Test and activate your workflow

Step 1: Create a new workflow and add the first step

In n8n, click the "Add workflow" button in the Workflows tab to create a new workflow. Add the starting point – a trigger on when your workflow should run: an app event, a schedule, a webhook call, another workflow, an AI chat, or a manual trigger. Sometimes, the HTTP Request node might already serve as your starting point.

Google Drive and Google Gemini Chat Model integration: Create a new workflow and add the first step

Step 2: Add and configure Google Drive and Google Gemini Chat Model nodes

You can find Google Drive and Google Gemini Chat Model in the nodes panel. Drag them onto your workflow canvas, selecting their actions. Click each node, choose a credential, and authenticate to grant n8n access. Configure Google Drive and Google Gemini Chat Model nodes one by one: input data on the left, parameters in the middle, and output data on the right.

Google Drive and Google Gemini Chat Model integration: Add and configure Google Drive and Google Gemini Chat Model nodes

Step 3: Connect Google Drive and Google Gemini Chat Model

A connection establishes a link between Google Drive and Google Gemini Chat Model (or vice versa) to route data through the workflow. Data flows from the output of one node to the input of another. You can have single or multiple connections for each node.

Google Drive and Google Gemini Chat Model integration: Connect Google Drive and Google Gemini Chat Model

Step 4: Customize and extend your Google Drive and Google Gemini Chat Model integration

Use n8n's core nodes such as If, Split Out, Merge, and others to transform and manipulate data. Write custom JavaScript or Python in the Code node and run it as a step in your workflow. Connect Google Drive and Google Gemini Chat Model with any of n8n’s 1000+ integrations, and incorporate advanced AI logic into your workflows.

Google Drive and Google Gemini Chat Model integration: Customize and extend your Google Drive and Google Gemini Chat Model integration

Step 5: Test and activate your Google Drive and Google Gemini Chat Model workflow

Save and run the workflow to see if everything works as expected. Based on your configuration, data should flow from Google Drive to Google Gemini Chat Model or vice versa. Easily debug your workflow: you can check past executions to isolate and fix the mistake. Once you've tested everything, make sure to save your workflow and activate it.

Google Drive and Google Gemini Chat Model integration: Test and activate your Google Drive and Google Gemini Chat Model workflow

Automate Image Validation Tasks using AI Vision

This n8n workflow shows how using multimodal LLMs with AI vision can tackle tricky image validation tasks which are near impossible to achieve with code and often impractical to be done by humans at scale.

You may need image validation when users submitted photos or images are required to meet certain criteria before being accepted. A wine review website may require users only submit photos of wine with labels, a bank may require account holders to submit scanned documents for verification etc.

In this demonstration, our scenario will be to analyse a set of portraits to verify if they meet the criteria for valid passport photos according to the UK government website (https://www.gov.uk/photos-for-passports).

How it works

Our set of portaits are jpg files downloaded from our Google Drive using the Google Drive node.
Each image is resized using the Edit Image node to ensure a balance between resolution and processing speed.
Using the Basic LLM node, we'll define a "user message" option with the type of binary (data). This will allow us to pass our portrait to the LLM as an input.
With our prompt containing the criteria pulled off the passport photo requirements webpage, the LLM is able to validate the photo does or doesn't meet its criteria.
A structured output parser is used to structure the LLM's response to a JSON object which has the "is_valid" boolean property. This can be useful to further extend the workflow.

Requirements
Google Gemini API key
Google Drive account

Customising this workflow

Not using Gemini? n8n's LLM node works with any compatible multimodal LLM so feel free to swap Gemini out for OpenAI's GPT4o or Antrophic's Claude Sonnet.

Don't need to validate portraits? Try other use cases such as document classification, security footage analysis, people tagging in photos and more.

Nodes used in this workflow

Popular Google Drive and Google Gemini Chat Model workflows

+6

API Schema Extractor

This workflow automates the process of discovering and extracting APIs from various services, followed by generating custom schemas. It works in three distinct stages: research, extraction, and schema generation, with each stage tracking progress in a Google Sheet. 🙏 Jim Le deserves major kudos for helping to build this sophisticated three-stage workflow that cleverly automates API documentation processing using a smart combination of web scraping, vector search, and LLM technologies. How it works Stage 1 - Research: Fetches pending services from a Google Sheet Uses Google search to find API documentation Employs Apify for web scraping to filter relevant pages Stores webpage contents and metadata in Qdrant (vector database) Updates progress status in Google Sheet (pending, ok, or error) Stage 2 - Extraction: Processes services that completed research successfully Queries vector store to identify products and offerings Further queries for relevant API documentation Uses Gemini (LLM) to extract API operations Records extracted operations in Google Sheet Updates progress status (pending, ok, or error) Stage 3 - Generation: Takes services with successful extraction Retrieves all API operations from the database Combines and groups operations into a custom schema Uploads final schema to Google Drive Updates final status in sheet with file location Ideal for: Development teams needing to catalog multiple APIs API documentation initiatives Creating standardized API schema collections Automating API discovery and documentation Accounts required: Google account (for Sheets and Drive access) Apify account (for web scraping) Qdrant database Gemini API access Set up instructions: Prepare your Google Sheets document with the services information. Here's an example of a Google Sheet – you can copy it and change or remove the values under the columns. Also, make sure to update Google Sheets nodes with the correct Google Sheet ID. Configure Google Sheets OAuth2 credentials, required third-party services (Apify, Qdrant) and Gemini. Ensure proper permissions for Google Drive access.

Extract text from PDF and image using Vertex AI (Gemini) into CSV

Case Study I'm too lazy to record every transaction for my expense tracking. Since all my expenses are digital, I just extract the transactions from bank PDF statements and screenshots into CSV to import into my budgeting software. Read more -> How I used A.I. to track all my expenses What this workflow does Upload your PDF or screenshots into Google Drive It then passes the PDF/image to Vertex Gemini to do some A.I. image recognition It then sends the transactions as CSV and stores it into another Google Drive folder Setup Set up 2 google drive folders. 1 for uploading and 1 for the output. Input your Google Drive crendtials Input your Vertex Gemini credentials How to adjust it to your needs You can upload other types of documents for information extraction. You can extract any text data from any image or PDF You can adjust the A.I. prompt to do different things

Automate Image Validation Tasks using AI Vision

This n8n workflow shows how using multimodal LLMs with AI vision can tackle tricky image validation tasks which are near impossible to achieve with code and often impractical to be done by humans at scale. You may need image validation when users submitted photos or images are required to meet certain criteria before being accepted. A wine review website may require users only submit photos of wine with labels, a bank may require account holders to submit scanned documents for verification etc. In this demonstration, our scenario will be to analyse a set of portraits to verify if they meet the criteria for valid passport photos according to the UK government website (https://www.gov.uk/photos-for-passports). How it works Our set of portaits are jpg files downloaded from our Google Drive using the Google Drive node. Each image is resized using the Edit Image node to ensure a balance between resolution and processing speed. Using the Basic LLM node, we'll define a "user message" option with the type of binary (data). This will allow us to pass our portrait to the LLM as an input. With our prompt containing the criteria pulled off the passport photo requirements webpage, the LLM is able to validate the photo does or doesn't meet its criteria. A structured output parser is used to structure the LLM's response to a JSON object which has the "is_valid" boolean property. This can be useful to further extend the workflow. Requirements Google Gemini API key Google Drive account Customising this workflow Not using Gemini? n8n's LLM node works with any compatible multimodal LLM so feel free to swap Gemini out for OpenAI's GPT4o or Antrophic's Claude Sonnet. Don't need to validate portraits? Try other use cases such as document classification, security footage analysis, people tagging in photos and more.

CV Resume PDF Parsing with Multimodal Vision AI

This n8n workflow demonstrates how we can use Multimodal LLMs to parse and extract from PDF documents in n8n. In this particular scenario, we're passing a candidate's CV/resume to an AI which filters out unqualified applications. However, this sneaky candidate has added in hidden prompt to bypass our bot! Whatever will we do? No fret, using AI Vision is one approach to solve this problem... read on! How it works Our candidate's CV/Resume is a PDF downloaded via Google Drive for this demonstration. The PDF is then converted into an image PNG using a tool called Stirling PDF. Since the hidden prompt has a white font color, it is is invisible in the converted image. The image is then forwarded to a Basic LLM node to process using our multimodal model - in this example, we'll use Google's Gemini 1.5 Pro. In the Basic LLM node, we'll need to set a User Message with the type of Binary. This allows us to directly send the image file in our request. The LLM is now immune to the hidden prompt and its response is has expected. The example CV/Resume with hidden prompt can be found here: https://drive.google.com/file/d/1MORAdeev6cMcTJBV2EYALAwll8gCDRav/view?usp=sharing Requirements Google Gemini API Key. Alternatively, GPT4 will also work for this use-case. Stirling PDF or another service which can convert PDFs into images. Note for data privacy, this example uses a public API and it is recommended that you self-host and use a private instance of Stirling PDF instead. Customising the workflow Swap out the manual trigger for another trigger such as a webhook to integrate into your existing services. This example demonstrates a validation use-case ie. "does the candidate look qualified?". You can try additionally extracting data points instead such as years of experiences, previous companies etc.
+3

Transcribing Bank Statements To Markdown Using Gemini Vision AI

This n8n workflow demonstrates an approach to parsing bank statement PDFs with multimodal LLMs as an alternative to traditional OCR. This allows for much more accurate data extraction from the document especially when it comes to tables and complex layouts. Multimodal Parsing is better than traditiona OCR because: It reduces complexity and overhead by avoiding the need to preprocess the document into text format such as markdown before passing to the LLM. It handles non-standard PDF formats which may produce garbled output via traditional OCR text conversion. It's orders of magnitude cheaper than premium OCR models that still require post-processing cleanup and formatting. LLMs can format to any schema or language you desire! How it works You can use the example bank statement created specifically for this workflow here: https://drive.google.com/file/d/1wS9U7MQDthj57CvEcqG_Llkr-ek6RqGA/view?usp=sharing A PDF bank statement is imported via Google Drive. For this demo, I've created a mock bank statement which includes complex table layouts of 5 columns. Typically, OCR will be unable to align the columns correctly and mistake some deposits for withdrawals. Because multimodal LLMs do not accept PDFs directly, well have to convert the PDF to a series of images. We can achieve this by using a tool such as Stirling PDF. Stirling PDF is self-hostable which is handy for sensitive data such as bank statements. Stirling PDF will return our PDF as a series of JPGs (one for each page) in a zipped file. We can use n8n's decompress node to extract the images and ensure they are ordered by using the Sort node. Next, we'll resize each page using the Edit Image node to ensure the right balance between resolution limits and processing speed. Each resized page image is then passed into the Basic LLM node which will use our multimodal LLM of choice - Gemini 1.5 Pro. In the LLM node's options, we'll add a "user message" of type binary (data) which is how we add our image data as an input. Our prompt will instruct the multimodal LLM to transcribe each page to markdown. Note, you do not need to do this - you can just ask for data points to extract directly! Our goal for this template is to demonstrate the LLMs ability to accurately read the page. Finally, with our markdown version of all pages, we can pass this to another LLM node to extract required data such as deposit line items. Requirements Google Gemini API for Multimodal LLM. Google Drive access for document storage. Stirling PDF instance for PDF to Image conversion Customising the workflow At time of writing, Gemini 1.5 Pro is the most accurate in text document parsing with a relatively low cost. If you are not using Google Gemini however you can switch to other multimodal LLMs such as OpenAI GPT or Antrophic Claude. If you don't need the markdown, simply asking what to extract directly in the LLM's prompt is also acceptable and would save a few extra steps. Not parsing any bank statements any time soon? This template also works for Invoices, inventory lists, contracts, legal documents etc.
+2

Visual Regression Testing with Apify and AI Vision Model

This n8n workflow is a proof-of-concept template exploring how we might work with multimodal LLMs and their multi-image analysis capabilities. In this demo, we compare 2 screenshots of a webpage taken at different timestamps and pass both to our multimodal LLM for a visual comparison of differences. Handling multiple binary inputs (ie. images) in an AI request is supported by n8n's basic LLM node. How it works This template is intended to run as 2 parts: first to generate the base screenshots and next to run the visual regression test which captures fresh screenshots. Starting with a list of webpages captured in a Google sheet, base screenshots are captured for each using a external web scraping service called Apify.com (I prefer Apify but feel free to use whichever web scraping service available to you) These base screenshots are uploaded to Google Drive and will be referenced later when we run our testing. Phase 2 of the workflow, we'll use a scheduled trigger to fire sometime in the future which will reuse our web scraping service to generate fresh screenshots of our desired webpages. Next, re-download our base screenshots in parallel and with both old and new captures, we'll pass these to our LLM node. In the LLM node's options, we'll define 2 "user message" inputs with the type of binary (data) for our images. Finally, we'll prompt our LLM with our testing criteria and capture the regressions detected. Note, results will vary depending on which LLM you use. A final report can be generated using the LLM's output and is uploaded to Linear. Requirements Apify.com API key for web screenshotting service Google Drive and Sheets access to store list of webpages and captures Customising this workflow Have your own preferred web screenshotting service? Feel free to swap out Apify with your service of choice. If the web screenshot is too large, it may prove difficult for the LLM to spot differences with precision. Try splitting up captures into smaller images instead.

Build your own Google Drive and Google Gemini Chat Model integration

Create custom Google Drive and Google Gemini Chat Model workflows by choosing triggers and actions. Nodes come with global operations and settings, as well as app-specific parameters that can be configured. You can also use the HTTP Request node to query data from any app or service with a REST API.

Google Drive supported actions

Copy
Create a copy of an existing file
Create From Text
Create a file from a provided text
Delete
Permanently delete a file
Download
Download a file
Move
Move a file to another folder
Share
Add sharing permissions to a file
Update
Update a file
Upload
Upload an existing file to Google Drive
Search
Search or list files and folders
Create
Create a folder
Delete
Permanently delete a folder
Share
Add sharing permissions to a folder
Create
Create a shared drive
Delete
Permanently delete a shared drive
Get
Get a shared drive
Get Many
Get the list of shared drives
Update
Update a shared drive

FAQs

  • Can Google Drive connect with Google Gemini Chat Model?

  • Can I use Google Drive’s API with n8n?

  • Can I use Google Gemini Chat Model’s API with n8n?

  • Is n8n secure for integrating Google Drive and Google Gemini Chat Model?

  • How to get started with Google Drive and Google Gemini Chat Model integration in n8n.io?

Need help setting up your Google Drive and Google Gemini Chat Model integration?

Discover our latest community's recommendations and join the discussions about Google Drive and Google Gemini Chat Model integration.
hubschrauber
Jon
David O'Neil

Looking to integrate Google Drive and Google Gemini Chat Model in your company?

Over 3000 companies switch to n8n every single week

Why use n8n to integrate Google Drive with Google Gemini Chat Model

Build complex workflows, really fast

Build complex workflows, really fast

Handle branching, merging and iteration easily.
Pause your workflow to wait for external events.

Code when you need it, UI when you don't

Simple debugging

Your data is displayed alongside your settings, making edge cases easy to track down.

Use templates to get started fast

Use 1000+ workflow templates available from our core team and our community.

Reuse your work

Copy and paste, easily import and export workflows.

Implement complex processes faster with n8n

red iconyellow iconred iconyellow icon