🧬 Automatic Helicobacter pylori Diagnosis

📌 Project Overview

Helicobacter pylori (H. pylori) is a bacterium strongly associated with gastritis and gastric cancer. Its diagnosis from immunohistochemistry (IHC) slides is traditionally performed by pathologists through manual visual inspection, a process that is time-consuming and prone to subjectivity, especially in low-density cases.

This project explores deep learning–based automated systems to assist in the detection of H. pylori from histopathology images. The goal is to provide a robust patient-level diagnosis by aggregating patch-level predictions extracted from whole-slide images.

Two different approaches are implemented and compared:

System I: Anomaly Detection using Autoencoders
System II: Patient-level Classification using a Gated Attention Mechanism

🎯 Challenge Objective

Detect H. pylori presence from IHC whole-slide images
Perform patch-level analysis and aggregate results into a patient-level diagnosis
Compare different deep learning architectures and design choices
Evaluate robustness using cross-validation and a holdout test set

🧠 Methodology

The proposed pipeline consists of three main stages:

Patch Extraction (Preprocessed)
The dataset provides 256×256 image patches extracted from tissue borders, where H. pylori typically appears.
Patch-level Analysis
- Anomaly detection via reconstruction error (System I)
- Feature embedding and attention-based aggregation (System II)
Patient-level Diagnosis
- Patch predictions are aggregated to produce a final binary diagnosis per patient

🔍 System I: Anomaly Detection via Autoencoders

System I formulates H. pylori detection as an anomaly detection problem.

Key Ideas

Models are trained only on patches from H. pylori–negative patients
The model learns the distribution of normal tissue
Patches containing bacteria produce high reconstruction error
Patch predictions are aggregated to diagnose each patient

Models

Convolutional Autoencoder (AE)
Variational Autoencoder (VAE)

Reconstruction Error Metrics

Three reconstruction error definitions were evaluated:

Mean Squared Error (MSE)
Mean Absolute Error on the Red Channel
HSV-based Red Pixel Reconstruction Error (selected)

The HSV-based metric explicitly captures biologically relevant red staining and achieved the best performance across all experiments.

👁️ System II: Attention-Based Patient Classification

System II treats each patient as a bag of image patches and learns a single patient representation.

Pipeline

Extract patch embeddings using the encoder from System I
Apply a Gated Attention Mechanism to weight informative patches
Aggregate patches into a patient-level representation
Perform binary classification

This approach allows the model to focus on the most relevant regions while suppressing background noise.

📊 Dataset

This project uses the Quiron dataset, a collection of 245 whole-slide immunohistochemistry images of gastric tissue, each corresponding to a different patient.

Data Distribution

It consists of 245 patients, with each patient contributing a single whole-slide image (WSI). For the purposes of binary classification, the diagnostic labels are organized as follows:

Negative Group (117 patients): Labeled as NEGATIVA.
Positive Group (128 patients): Consolidates the BAIXA (Low) and ALTA (High) categories.

Image Preparation and Validation

To optimize the data for computational analysis, the following preprocessing steps were applied:

Patch Generation: WSIs were segmented into $256 \times 256$ pixel tiles, focusing primarily on the tissue's marginal areas.
Expert Annotation: A specific portion of the dataset features patches that were manually labeled by specialists.
Experimental Setup: This annotated subset is reserved for critical pipeline stages, including:
- Cross-validation
- Model validation
- Selection of classification thresholds

📚 References

Cano, P., Caravaca, Á., Gil, D., & Musulen, E.
Diagnosis of Helicobacter pylori using autoencoders for the detection of anomalous staining patterns in immunohistochemistry images.
arXiv preprint, 2023.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.vscode		.vscode
Models		Models
crossvalidationDIAGNOSIS		crossvalidationDIAGNOSIS
crossvalidationPATCH		crossvalidationPATCH
diagnosis		diagnosis
reconstruction		reconstruction
results		results
.gitignore		.gitignore
AEExample_Script.py		AEExample_Script.py
README.md		README.md
attention.py		attention.py
attention.sh		attention.sh
attention_train.py		attention_train.py
class_head.py		class_head.py
config.yml		config.yml
crossvalidation_patchClassification.py		crossvalidation_patchClassification.py
crossvalidation_patientDiagnosis.py		crossvalidation_patientDiagnosis.py
dataset.py		dataset.py
holdOut_pipeline.py		holdOut_pipeline.py
main.py		main.py
notes.md		notes.md
patient_diagnosis.py		patient_diagnosis.py
reconstruction.py		reconstruction.py
roc_curve.py		roc_curve.py
run_agus.sh		run_agus.sh
run_crossvalidation.sh		run_crossvalidation.sh
run_reconstruction.sh		run_reconstruction.sh
run_train.sh		run_train.sh
run_tripletLoss.sh		run_tripletLoss.sh
tomi.sh		tomi.sh
train_conv_ae.py		train_conv_ae.py
train_conv_vae.py		train_conv_vae.py
tripletLoss.py		tripletLoss.py
tsne.py		tsne.py
tsne_run.sh		tsne_run.sh
vae_train.sh		vae_train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧬 Automatic Helicobacter pylori Diagnosis

📌 Project Overview

🎯 Challenge Objective

🧠 Methodology

🔍 System I: Anomaly Detection via Autoencoders

Key Ideas

Models

Reconstruction Error Metrics

👁️ System II: Attention-Based Patient Classification

Pipeline

📊 Dataset

Data Distribution

Image Preparation and Validation

📚 References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧬 Automatic Helicobacter pylori Diagnosis

📌 Project Overview

🎯 Challenge Objective

🧠 Methodology

🔍 System I: Anomaly Detection via Autoencoders

Key Ideas

Models

Reconstruction Error Metrics

👁️ System II: Attention-Based Patient Classification

Pipeline

📊 Dataset

Data Distribution

Image Preparation and Validation

📚 References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages