Biomedical Information Extraction

Investigating the Relationship Between Microbiota and Neurological Disorders in the Gut-Brain Axis

This repository contains code developed for my masters thesis in IT and Cognition at the University of Copenhagen, building a Named Entity Recognition (NER) and Relation Extraction (RE) pipeline for biomedical abstracts using the GutBrainIE2025 dataset (https://hereditary.dei.unipd.it/challenges/gutbrainie/2025/).

Repository Structure

.
├── notebooks
│   ├── BiLSTM_CRF_baseline.ipynb
│   ├── ner-statistics.ipynb
│   └── dataset-wordclouds-gutbrainie.ipynb
├── requirements.txt
├── run_on_ucloud.sh
├── scripts
│   ├── config.py
│   ├── datasets.py
│   ├── evaluation_gutbrainie2025.py
│   ├── get_predictions_NER.py
│   ├── get_predictions_RE.py
│   ├── id_rsa
│   ├── models.py
│   ├── optuna_NER.py
│   ├── optuna_RE.py
│   ├── optuna_RE_ternary.py
│   ├── train_evaluate_NER_BERT_models.py
│   ├── train_evaluate_RE_BERT_models.py
│   ├── train_evaluate_ternary_RE_BERT_models.py
│   └── utils.py

Usage

To obtain the optimal hyperparameters, run optuna_NER.py, optuna_RE.py (for binary classification RE), and optuna_RE_ternary.py (for multiclass RE). To fine-tune the models, run the pipeline, generate Named Entity predictions running train_evaluate_NER_BERT_models.py, and then either train_evaluate_RE_BERT_models.py (for binary classification RE) or train_evaluate_ternary_RE_BERT_models.py (for multiclass classification). Make sure to install the dependencies and change all folder paths. Evaluation is conducted within these scripts based on the evaluation script of the GutBrainIE2025 challenge.

Additional Information

The notebooks folder contains the script to run the BiLSTM-CRF tagger (https://github.com/reascr/BiLSTM-CRF-NER-Baseline, forked from https://github.com/glample/tagger). Further, it includes notebooks to compute the Wilcoxon-signed rank test for statistical significance testing, to generate wordclouds for specific entity labels in the dataset, and the code used to determine the optimal inference threshold for binary RE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Biomedical Information Extraction

Investigating the Relationship Between Microbiota and Neurological Disorders in the Gut-Brain Axis

Repository Structure

Usage

Additional Information

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
notebooks		notebooks
scripts		scripts
.gitignore		.gitignore
Master_Thesis_Andrea_Schroeter.pdf		Master_Thesis_Andrea_Schroeter.pdf
README.md		README.md
requirements.txt		requirements.txt
run_on_ucloud.sh		run_on_ucloud.sh

Folders and files

Latest commit

History

Repository files navigation

Biomedical Information Extraction

Investigating the Relationship Between Microbiota and Neurological Disorders in the Gut-Brain Axis

Repository Structure

Usage

Additional Information

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages