Vector database (Qdrant) as a data analysis tool
Working with images, embedding model - Voyage AI.
KNN (k nearest neighbours) classification
- The first pipeline uploads (lands) dataset to Qdrant's collection.
- This is the KNN classifier tool, which takes any image as input and classifies it based on queries to the Qdrant (lands) collection.
To recreate it
You'll have to upload lands dataset from Kaggle to your own Google Storage bucket, and re-create APIs/connections to Qdrant Cloud (you can use Free Tier cluster), Voyage AI API & Google Cloud Storage
In general, pipelines are adaptable to any dataset of images
[This workflow] KNN classification tool
This tool takes an image URL, and as output, it returns a class of the object on the image.
- An image URL is received via the Execute Workflow Trigger, which is then sent to the Voyage AI Multimodal Embeddings API to fetch its embedding.
- The image's embedding vector is then used to query Qdrant, returning a set of X similar images with pre-labeled classes.
- Majority voting is done for classes of neighbouring images.
- A loop is used to resolve scenarios where there is a tie in Majority Voting, and we increase the number of neighbours to retrieve.
- When the loop finally resolves, the identified class is returned to the calling workflow.