Helicobacter pylori (H. pylori) is a bacterium strongly associated with gastritis and gastric cancer. Its diagnosis from immunohistochemistry (IHC) slides is traditionally performed by pathologists through manual visual inspection, a process that is time-consuming and prone to subjectivity, especially in low-density cases.
This project explores deep learning–based automated systems to assist in the detection of H. pylori from histopathology images. The goal is to provide a robust patient-level diagnosis by aggregating patch-level predictions extracted from whole-slide images.
Two different approaches are implemented and compared:
- System I: Anomaly Detection using Autoencoders
- System II: Patient-level Classification using a Gated Attention Mechanism
- Detect H. pylori presence from IHC whole-slide images
- Perform patch-level analysis and aggregate results into a patient-level diagnosis
- Compare different deep learning architectures and design choices
- Evaluate robustness using cross-validation and a holdout test set
The proposed pipeline consists of three main stages:
-
Patch Extraction (Preprocessed)
The dataset provides 256×256 image patches extracted from tissue borders, where H. pylori typically appears. -
Patch-level Analysis
- Anomaly detection via reconstruction error (System I)
- Feature embedding and attention-based aggregation (System II)
-
Patient-level Diagnosis
- Patch predictions are aggregated to produce a final binary diagnosis per patient
System I formulates H. pylori detection as an anomaly detection problem.
- Models are trained only on patches from H. pylori–negative patients
- The model learns the distribution of normal tissue
- Patches containing bacteria produce high reconstruction error
- Patch predictions are aggregated to diagnose each patient
- Convolutional Autoencoder (AE)
- Variational Autoencoder (VAE)
Three reconstruction error definitions were evaluated:
- Mean Squared Error (MSE)
- Mean Absolute Error on the Red Channel
- HSV-based Red Pixel Reconstruction Error (selected)
The HSV-based metric explicitly captures biologically relevant red staining and achieved the best performance across all experiments.
System II treats each patient as a bag of image patches and learns a single patient representation.
- Extract patch embeddings using the encoder from System I
- Apply a Gated Attention Mechanism to weight informative patches
- Aggregate patches into a patient-level representation
- Perform binary classification
This approach allows the model to focus on the most relevant regions while suppressing background noise.
This project uses the Quiron dataset, a collection of 245 whole-slide immunohistochemistry images of gastric tissue, each corresponding to a different patient.
It consists of 245 patients, with each patient contributing a single whole-slide image (WSI). For the purposes of binary classification, the diagnostic labels are organized as follows:
- Negative Group (117 patients): Labeled as
NEGATIVA. - Positive Group (128 patients): Consolidates the
BAIXA(Low) andALTA(High) categories.
To optimize the data for computational analysis, the following preprocessing steps were applied:
-
Patch Generation: WSIs were segmented into
$256 \times 256$ pixel tiles, focusing primarily on the tissue's marginal areas. - Expert Annotation: A specific portion of the dataset features patches that were manually labeled by specialists.
-
Experimental Setup: This annotated subset is reserved for critical pipeline stages, including:
- Cross-validation
- Model validation
- Selection of classification thresholds
Cano, P., Caravaca, Á., Gil, D., & Musulen, E.
Diagnosis of Helicobacter pylori using autoencoders for the detection of anomalous staining patterns
in immunohistochemistry images.
arXiv preprint, 2023.
