Vector database (Qdrant) as a data analysis tool
Working with images, embedding model - Voyage AI.
For anomaly detection
- The first pipeline uploads (crops) dataset to Qdrant's collection.
- The second pipeline sets up cluster (class) centres in this Qdrant collection & cluster (class) threshold scores.
- This is the anomaly detection tool, which takes any image as input and uses all preparatory work done with Qdrant (crops) collection.
To recreate it
You'll have to upload crops dataset from Kaggle to your own Google Storage bucket and re-create APIs/connections to Qdrant Cloud (you can use Free Tier cluster), Voyage AI API & Google Cloud Storage
In general, pipelines are adaptable to any dataset of images
[This workflow] Anomaly Detection Tool
This is the tool that can be used directly for anomalous images (crops) detection.
It takes as input (any) image URL and returns a text message telling if whatever this image depicts is anomalous to the crop dataset stored in Qdrant.
- An Image URL is received via the Execute Workflow Trigger, which is used to generate embedding vectors using the Voyage AI Embeddings API.
- The returned vectors are used to query the Qdrant collection to determine if the given crop is known by comparing it to threshold scores of each image class (crop type).
- If the image scores lower than all thresholds, then the image is considered an anomaly for the dataset.