Recommender Systems Review-Based: Ottimizzazione modulare per migliorare il ranking riducendo i costi computazionali e temporali
This repository contains the official implementation of CARU (Cross-Attention Review-to-User) and the TIL (Triplet Importance Learning) framework, as presented in the Master's Thesis at Politecnico di Bari (2024/2025).
This project introduces a modular ranking-based approach designed to overcome the limitations of traditional error-based models (MSE) and standard BPR. It focuses on computational efficiency and ranking quality using two main components:
- CARU (Cross-Attention Review-to-User): A lightweight encoder implementing the Contextualized BPR (cBPR) paradigm. It utilizes Multi-Head Cross-Attention to enrich user and item embeddings with semantic information from reviews (or a learnable null token).
-
TIL (Triplet Importance Learning): An auxiliary module trained via Bilevel Optimization. It dynamically learns importance weights (
$w_{uij}$ ) for training triplets to mitigate noise and improve the generalization of the target recommender system.Reference: Wu, H. (n.d.). A Triplet Importance Learning Framework for Recommendation with Implicit Feedback.
main_NEI.py: The entry point for training, grid search, and testing. It uses Google Fire for CLI interaction.CARU.py: Contains theCrossAttentionReview2Userencoder architecture.TIL.py: Contains theTILmodule for generating triplet importance weights.context_BPR.py: The orchestrator model that combines encoders and computes the cBPR scores.dataset.py: Handles data loading for Amazon 5-core datasets.prepare.py: Script for converting JSON datasets into optimized binary memory-mapped files.
- Python 3.11+
- PyTorch (CUDA recommended)
- Pandas, NumPy, Scikit-learn, Tqdm, Fire
The experiments are conducted on the Amazon 5-core datasets. To replicate the study, the data must be pre-processed following a strict pipeline to ensure compatibility with the prepare.py script.
To ensure fair comparison with baselines, you must first split the dataset using the native splitting function provided by the DIRECT framework (MIT License).
- Reference: JacksonWuxs/DIRECT
- Output: This will generate Train/Validation/Test split files (JSON Lines).
The model requires semantic information from reviews. You must enrich the split files by adding review embeddings generated by OpenAI text-embedding-3-small (1536-dim).
Crucial Formatting Requirement:
The prepare.py script is optimized to read binary data directly. You must format the embedding vector as a Base64 encoded string within the JSON object.
- Key Name:
"tokens" - Value Format: Base64 string of the float32 array.
Handling Missing Reviews (Null Token Strategy): If a review is missing or unavailable, you must insert the Base64 representation of a 1536-dimensional vector of zeros (float32).
- Visual Check: This string will appear essentially as a long sequence of
"A"characters. - Model Logic: When the data loader encounters this zero-vector, the model detects the absence of context and automatically utilizes a specific, learnable "null token" strategy to substitute the missing review information.
Tip: You can generate the null token string in Python using:
base64.b64encode(np.zeros(1536, dtype=np.float32)).decode('utf-8')
Example of expected JSON line structure:
{
"reviewerID": "A1HK2FQW6KXQB2",
"asin": "097293751X",
"overall": 5.0,
"reviewText": "Perfect for new parents. We were able to keep...",
"tokens": "KRmRPNlvnjtvdq+8718KPJ..."
}Example of a Missing Review (Null Token Case):
In this scenario, reviewText is empty (or ignored), and tokens contains the zero-vector.
{
"reviewerID": "A1J1GBJREDREP",
"asin": "B00563XRYM",
"overall": 1.0,
"reviewText": "",
"tokens": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA..."
}Why Base64? As seen in
prepare.py, the loader usesnp.frombuffer(base64.b64decode(...))to strictly parse binary data. Storing 1536 floats as plain text JSON lists would drastically increase file size and parsing time.
Clone the repository and install the required dependencies:
git clone https://github.com/PeppeJerry/Cross-Attention-Review-to-User-CARU-.git CARU_TIL
cd CARU_TIL
pip install -r requirements.txt