Evaluate Hybrid Search on Legal Dataset
This is the second part of "Hybrid Search with Qdrant & n8n, Legal AI."
The first part, "Indexing," covers preparing and uploading the dataset to Qdrant.
Overview
This pipeline demonstrates how to perform Hybrid Search on a Qdrant collection using questions and text chunks (containing answers) from the
LegalQAEval dataset (isaacus).
On a small subset of questions, it shows:
- How to set up hybrid retrieval in Qdrant with:
- BM25-based keyword retrieval;
- mxbai-embed-large-v1 semantic retrieval;
- Reciprocal Rank Fusion (RRF), a simple zero-shot fusion of the two searches;
- How to run a basic evaluation:
- Calculate hits@1 — the percentage of evaluation questions where the top-1 retrieved text chunk contains the correct answer
After running this pipeline, you will have a quality estimate of a simple hybrid retrieval setup.
From there, you can reuse Qdrant’s Query Points node to build a legal RAG chatbot.
Embedding Inference
- By default, this pipeline uses Qdrant Cloud Inference to convert questions to embeddings.
- You can also use an external embedding provider (e.g. OpenAI).
- In that case, minimally update the pipeline, similar to the adjustments showed in Part 1: Indexing.
Prerequisites
- Completed Part 1 pipeline, "Hybrid Search with Qdrant & n8n, Legal AI: Indexing", and the collection created in it;
- All the requirements of Part 1 pipeline;
Hybrid Search
The example here is a basic hybrid query. You can extend/enhance it with:
- Reranking strategies;
- Different fusion techniques;
- Score boosting based on metadata;
- ...
More details: Hybrid Queries in Qdrant.
P.S.