This repository demonstrates how to approximate Learning-to-Rank (LTR) behavior using Score Boosting formulas inside your vector database, specifically with Qdrant.
Full article: Learning-to-Rank Felt Too Complex. So I Tried Something Else.
Vector search is powerful for semantic relevance, but "most similar" doesn't always mean "most relevant." Traditional Learning-to-Rank (LTR) systems involve complex feature pipelines and model serving infrastructure.
This project shows a simpler way: using Qdrant's Score Boosting to combine:
- Semantic Score: Raw vector similarity.
- Payload Signals: Category, brand, and gender affinity.
- Dynamic Decay: Boosting results from sellers near the user (Gaussian Decay).
Final Score Formula:
final_score = $score + 0.30*(category_match) + 0.20*(brand_match) + 0.20*(gender_match) + gauss_decay(distance)
- Docker (to run Qdrant)
- Python 3.9+
docker run -p 6333:6333 qdrant/qdrant:v1.17.0pip install qdrant-client sentence-transformersOpen and run score_boosting_demo.ipynb. It will:
- Load 200 synthetic products from
data.jsonl. - Embed product titles using IBM's
granite-embedding-small-english-r2. - Create a Qdrant collection with Payload Indexes for performance.
- Run a side-by-side comparison between Plain Search and Personalized Search.
For a query "running shoes" and a user who prefers Nike/Adidas, Women's products, and is located in Istanbul:
Standard search returns semantically close items, but ignores user identity:
Adidas NMD_R1 Unisex(0.887)Puma Women's Better Foam(0.885)New Balance Men's Hierro(0.879)
Results are re-ranked to match the user's profile perfectly:
Adidas NMD_R1 Unisex(2.587) - Matches Brand, Category, Near UserAdidas Women's Cloudfoam(2.575) - Matches Gender, Brand, Category, Near UserNike Air Zoom Pegasus 40(2.571) - Matches Gender, Brand, Category, Near User
- LinkedIn: linkedin.com/in/gururaser
- GitHub: github.com/gururaser
- Medium: medium.com/@gururaser