Prompt-based Object Detection with Gemini 2.0

Created by

Jimleuk

Last update

Last update a year ago

How it works

An image is downloaded via the HTTP node and an "Edit Image" node is used to extract the file's width and height.
The image is then given to the Gemini 2.0 API to parse and return coordinates of the bounding box of the requested subjects. In this demo, we've asked for the AI to identify all bunnies.
The coordinates are then rescaled with the original image's width and height to correctl align them.
Finally to measure the accuracy of the object detection, we use the "Edit Image" node to draw the bounding boxes onto the original image.

Really up to the imagination! Perhaps a form of grounding for evidence based workflows or a higher form of image search can be built.

This template is just a demonstration of an experimental version of Gemini 2.0. It is recommended to wait for Gemini 2.0 to come out of this stage before using in production.