Skip to content

DataScienceUIBK/Context-Convergence-Inferential-QA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

9 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Context Convergence Improves Answering Inferential Questions

🧠 Why do LLMs struggle with inferential questions? Because not all context is equally helpfulβ€”some sentences guide reasoning, others just add noise.

πŸ’‘ What if we could measure how useful a sentence is for reasoning? This work shows that convergenceβ€”how much a sentence narrows down possible answersβ€”plays a key role in improving inferential QA.

🌟 Overview

Large Language Models (LLMs) are powerfulβ€”but they still struggle with inferential questions πŸ€” (those where answers must be reasoned, not directly found).

πŸ’‘ In this project, we introduce convergence as a signal that measures how well a sentence (hint) narrows down possible answers.

πŸ” What we show:

  • βœ… High-convergence sentences β†’ better QA performance
  • πŸ“Š Convergence > cosine similarity for passage selection
  • 🧠 Ordering sentences by convergence β†’ even better results

πŸ—‚οΈ Repository Structure

β”œβ”€β”€ dataset                                  # Data preparation and evaluation utilities
β”‚   β”œβ”€β”€ compute_similarities.py              # Computes cosine similarity scores
β”‚   β”œβ”€β”€ dataset_final.tar.gz                 # Ready-to-use final dataset for experiments
β”‚   β”œβ”€β”€ make.sh                              # Rebuilds the dataset pipeline from scratch
β”‚   β”œβ”€β”€ make_dataset.py                      # Creates dataset with convergence annotations
β”‚   β”œβ”€β”€ merge.py                             # Merges intermediate outputs into final dataset
β”‚   β”œβ”€β”€ qa.py                                # Runs QA evaluation pipeline
β”‚
└── experiments                              # Experiment scripts used in the paper
    β”œβ”€β”€ convergence_vs_cosine.py             # Compares convergence vs cosine similarity
    β”œβ”€β”€ order.py                             # Tests effect of sentence ordering

πŸ“¦ Dataset

πŸ“ Preprocessed dataset included:

dataset/dataset_final.tar.gz
  • βœ… Recommended: Use this directly
  • ⚠️ Optional: Rebuild from scratch if needed

🧩 What’s inside the dataset?

The dataset is derived fromΒ hint-based QA data (TriviaHG)Β and is designed forΒ inferential question answering. Unlike standard QA datasets, the answer must beΒ inferred by combining hints, not extracted from a single sentence.

πŸ“Š Convergence in the dataset

Each hint has aΒ convergence score, measuring how well it narrows down candidate answers:

  • 🟒 High β†’ strongly filters incorrect answers
  • 🟑 Medium β†’ partially informative
  • πŸ”΄ Low β†’ weak or ambiguous

πŸ§ͺ Running Experiments

βš–οΈ Convergence vs Cosine

python experiments/convergence_vs_cosine.py

πŸ”’ Sentence Ordering

python experiments/order.py

πŸ” Reproducibility

You can reproduce the paper in two ways:

βœ… Option A β€” Reproduce using the provided dataset

This is the easiest and recommended way.

Step 1 β€” Get the code

Follow the following steps:

git clone https://github.com/DataScienceUIBK/Context-Convergence-Inferential-QA.git
cd Context-Convergence-Inferential-QA
pip install termcolor

You do not need to recreate the dataset for the experiments.

Step 2 β€” Run the experiments

python experiments/convergence_vs_cosine.py
python experiments/order.py

This reproduces the main experimental setup using the prepared data.

⚠️ Option B β€” Rebuild the dataset from scratch

Use this only if you want to regenerate the dataset yourself.

Step 1 β€” Complete setup first

Before rebuilding anything, finish all the following steps:

Make sure HintEval is installed correctly: πŸ‘‰ https://hinteval.readthedocs.io/

Then:

git clone https://github.com/DataScienceUIBK/Context-Convergence-Inferential-QA.git
cd Context-Convergence-Inferential-QA
pip install -r requirements.txt

Step 2 β€” Go to the dataset directory

cd dataset

Step 3 β€” Run the full dataset pipeline

bash make.sh

This rebuilds the dataset step by step using the scripts in dataset/.

Step 4 β€” Return to the repository root

cd ..

Step 5 β€” Run the experiments on the rebuilt data

python experiments/convergence_vs_cosine.py
python experiments/order.py

πŸ“Œ Notes for exact reproduction

  • Use the provided dataset if you want the closest match to the reported results.

  • Rebuilding the dataset is mainly for transparency and regeneration.

  • Make sure HintEval is installed correctly before rebuilding.

  • The experiments in this repository correspond to the two main studies in the paper:

    • convergence vs cosine similarity
    • sentence ordering by convergence

🧠 Key Findings

  • 🟒 Convergence is a strong relevance signal
  • πŸ“ˆ High-convergence passages β†’ better accuracy
  • ❌ Cosine similarity is not reliable
  • πŸ” Ordering by convergence improves performance
  • 🧭 LLMs prioritize earlier information

πŸ“š Citation

πŸ“œ License

MIT License β€” seeΒ LICENSE.

About

Code and experiments for studying convergence-based passage construction in inferential QA, demonstrating improvements over cosine similarity in retrieval-augmented reasoning with LLMs.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors