Back to Integrations
integrationGoogle Gemini Chat Model node
integration

Google Gemini Chat Model and Information Extractor integration

Save yourself the work of writing custom integrations for Google Gemini Chat Model and Information Extractor and use n8n instead. Build adaptable and scalable AI, and Langchain workflows that work with your technology stack. All within a building experience you will love.

How to connect Google Gemini Chat Model and Information Extractor

  • Step 1: Set up n8n

  • Step 2: Create a new workflow to connect Google Gemini Chat Model and Information Extractor

  • Step 3: Add the first step

  • Step 4: Add the Google Gemini Chat Model node

  • Step 5: Authenticate Google Gemini Chat Model

  • Step 6: Add the Information Extractor node

  • Step 7: Authenticate Information Extractor

  • Step 8: Configure Google Gemini Chat Model and Information Extractor nodes

  • Step 9: Connect Google Gemini Chat Model and Information Extractor

  • Step 10: Customize your Google Gemini Chat Model and Information Extractor integration

  • Step 11: Save and activate workflow

  • Step 12: Test the workflow

Transcribing Bank Statements To Markdown Using Gemini Vision AI

This n8n workflow demonstrates an approach to parsing bank statement PDFs with multimodal LLMs as an alternative to traditional OCR. This allows for much more accurate data extraction from the document especially when it comes to tables and complex layouts.

Multimodal Parsing is better than traditiona OCR because:

  • It reduces complexity and overhead by avoiding the need to preprocess the document into text format such as markdown before passing to the LLM.
  • It handles non-standard PDF formats which may produce garbled output via traditional OCR text conversion.
  • It's orders of magnitude cheaper than premium OCR models that still require post-processing cleanup and formatting. LLMs can format to any schema or language you desire!

How it works

You can use the example bank statement created specifically for this workflow here: https://drive.google.com/file/d/1wS9U7MQDthj57CvEcqG_Llkr-ek6RqGA/view?usp=sharing

  • A PDF bank statement is imported via Google Drive. For this demo, I've created a mock bank statement which includes complex table layouts of 5 columns. Typically, OCR will be unable to align the columns correctly and mistake some deposits for withdrawals.
  • Because multimodal LLMs do not accept PDFs directly, well have to convert the PDF to a series of images. We can achieve this by using a tool such as Stirling PDF. Stirling PDF is self-hostable which is handy for sensitive data such as bank statements.
  • Stirling PDF will return our PDF as a series of JPGs (one for each page) in a zipped file. We can use n8n's decompress node to extract the images and ensure they are ordered by using the Sort node.
  • Next, we'll resize each page using the Edit Image node to ensure the right balance between resolution limits and processing speed.
  • Each resized page image is then passed into the Basic LLM node which will use our multimodal LLM of choice - Gemini 1.5 Pro. In the LLM node's options, we'll add a "user message" of type binary (data) which is how we add our image data as an input.
  • Our prompt will instruct the multimodal LLM to transcribe each page to markdown. Note, you do not need to do this - you can just ask for data points to extract directly! Our goal for this template is to demonstrate the LLMs ability to accurately read the page.
  • Finally, with our markdown version of all pages, we can pass this to another LLM node to extract required data such as deposit line items.

Requirements

  • Google Gemini API for Multimodal LLM.
  • Google Drive access for document storage.
  • Stirling PDF instance for PDF to Image conversion

Customising the workflow

  • At time of writing, Gemini 1.5 Pro is the most accurate in text document parsing with a relatively low cost. If you are not using Google Gemini however you can switch to other multimodal LLMs such as OpenAI GPT or Antrophic Claude.

  • If you don't need the markdown, simply asking what to extract directly in the LLM's prompt is also acceptable and would save a few extra steps.

  • Not parsing any bank statements any time soon? This template also works for Invoices, inventory lists, contracts, legal documents etc.

Nodes used in this workflow

Popular Google Gemini Chat Model and Information Extractor workflows

Google Gemini Chat Model node
Sort node
Google Drive node
+5

Transcribing Bank Statements To Markdown Using Gemini Vision AI

This n8n workflow demonstrates an approach to parsing bank statement PDFs with multimodal LLMs as an alternative to traditional OCR. This allows for much more accurate data extraction from the document especially when it comes to tables and complex layouts. Multimodal Parsing is better than traditiona OCR because: It reduces complexity and overhead by avoiding the need to preprocess the document into text format such as markdown before passing to the LLM. It handles non-standard PDF formats which may produce garbled output via traditional OCR text conversion. It's orders of magnitude cheaper than premium OCR models that still require post-processing cleanup and formatting. LLMs can format to any schema or language you desire! How it works You can use the example bank statement created specifically for this workflow here: https://drive.google.com/file/d/1wS9U7MQDthj57CvEcqG_Llkr-ek6RqGA/view?usp=sharing A PDF bank statement is imported via Google Drive. For this demo, I've created a mock bank statement which includes complex table layouts of 5 columns. Typically, OCR will be unable to align the columns correctly and mistake some deposits for withdrawals. Because multimodal LLMs do not accept PDFs directly, well have to convert the PDF to a series of images. We can achieve this by using a tool such as Stirling PDF. Stirling PDF is self-hostable which is handy for sensitive data such as bank statements. Stirling PDF will return our PDF as a series of JPGs (one for each page) in a zipped file. We can use n8n's decompress node to extract the images and ensure they are ordered by using the Sort node. Next, we'll resize each page using the Edit Image node to ensure the right balance between resolution limits and processing speed. Each resized page image is then passed into the Basic LLM node which will use our multimodal LLM of choice - Gemini 1.5 Pro. In the LLM node's options, we'll add a "user message" of type binary (data) which is how we add our image data as an input. Our prompt will instruct the multimodal LLM to transcribe each page to markdown. Note, you do not need to do this - you can just ask for data points to extract directly! Our goal for this template is to demonstrate the LLMs ability to accurately read the page. Finally, with our markdown version of all pages, we can pass this to another LLM node to extract required data such as deposit line items. Requirements Google Gemini API for Multimodal LLM. Google Drive access for document storage. Stirling PDF instance for PDF to Image conversion Customising the workflow At time of writing, Gemini 1.5 Pro is the most accurate in text document parsing with a relatively low cost. If you are not using Google Gemini however you can switch to other multimodal LLMs such as OpenAI GPT or Antrophic Claude. If you don't need the markdown, simply asking what to extract directly in the LLM's prompt is also acceptable and would save a few extra steps. Not parsing any bank statements any time soon? This template also works for Invoices, inventory lists, contracts, legal documents etc.

Build your own Google Gemini Chat Model and Information Extractor integration

Create custom Google Gemini Chat Model and Information Extractor workflows by choosing triggers and actions. Nodes come with global operations and settings, as well as app-specific parameters that can be configured. You can also use the HTTP Request node to query data from any app or service with a REST API.

Google Gemini Chat Model and Information Extractor integration details

FAQs

  • Can Google Gemini Chat Model connect with Information Extractor?

  • Can I use Google Gemini Chat Model’s API with n8n?

  • Can I use Information Extractor’s API with n8n?

  • Is n8n secure for integrating Google Gemini Chat Model and Information Extractor?

  • How to get started with Google Gemini Chat Model and Information Extractor integration in n8n.io?

Looking to integrate Google Gemini Chat Model and Information Extractor in your company?

Over 3000 companies switch to n8n every single week

Why use n8n to integrate Google Gemini Chat Model with Information Extractor

Build complex workflows, really fast

Build complex workflows, really fast

Handle branching, merging and iteration easily.
Pause your workflow to wait for external events.

Code when you need it, UI when you don't

Simple debugging

Your data is displayed alongside your settings, making edge cases easy to track down.

Use templates to get started fast

Use 1000+ workflow templates available from our core team and our community.

Reuse your work

Copy and paste, easily import and export workflows.

Implement complex processes faster with n8n

red iconyellow iconred iconyellow icon