Compare Local Ollama Vision Models for Image Analysis using Google Docs
Process images using locally hosted Ollama Vision Models to extract detailed descriptions, contextual insights, and structured data. Save results directly to Google Docs for efficient collaboration.
Who is this for?
This workflow is ideal for developers, data analysts, marketers and AI enthusiasts who need to process and analyze images using locally hosted Ollama Vision Language Models. It’s particularly useful for tasks requiring detailed image descriptions, contextual analysis, and structured data extraction.
What problem is this workflow solving? / Use Case
The workflow solves the challenge of extracting meaningful insights from images in exhaustive detail, such as identifying objects, analyzing spatial relationships, extracting textual elements, and providing contextual information. This is especially helpful for applications in real estate, marketing, engineering, and research.
What this workflow does
This workflow:
- Downloads an image file from Google Drive.
- Processes the image using multiple Ollama Vision Models (e.g., Granite3.2-Vision, Gemma3, Llama3.2-Vision).
- Generates detailed markdown-based descriptions of the image.
- Saves the output to a Google Docs file for easy sharing and further analysis.
Setup
- Ensure you have access to a local instance of Ollama. https://ollama.com/
- Pull the Ollama vision models.
- Configure your Google Drive and Google Docs credentials in n8n.
- Provide the image file ID from Google Drive in the designated node.
- Update the list of Ollama vision models
- Test the workflow by clicking ‘Test Workflow’ to trigger the process.
How to customize this workflow to your needs
- Replace the image source with another provider if needed (e.g., AWS S3 or Dropbox).
- Modify the prompts in the "General Image Prompt" node to suit specific analysis requirements.
- Add additional nodes for post-processing or integrating results into other platforms like Slack or HubSpot.
Key Features:
- Detailed Image Analysis: Extracts comprehensive details about objects, spatial relationships, text elements, and contextual settings.
- Multi-Model Support: Utilizes multiple vision models dynamically for optimal performance.
- Markdown Output: Formats results in markdown for easy readability and documentation.
- Google Drive Integration: Seamlessly downloads images and saves results to Google Docs.