This pipeline is the first part of "Hybrid Search with Qdrant & n8n, Legal AI".
The second part, "Hybrid Search with Qdrant & n8n, Legal AI: Retrieval", covers retrieval and simple evaluation.
This pipeline transforms a Q&A legal corpus from Hugging Face (isaacus) into vector representations and indexes them to Qdrant, providing the foundation for running Hybrid Search, combining:
After running this pipeline, you will have a Qdrant collection with your legal dataset ready for hybrid retrieval on BM25 and dense embeddings: either mxbai-embed-large-v1 or text-embedding-3-small.
This pipeline equips you with two approaches for generating dense vectors: