Skip to content

gizemdogafiliz/Bjornstad-Syndrome-BCS1L-Gene-VUS-Pathogenicity-Classifier

Repository files navigation

Björnstad Syndrome — BCS1L Gene VUS Pathogenicity Classifier

⚠️ Academic Project: This is a computational biology course project (ENS 210: Computational Biology) completed as part of the Sabancı University curriculum.


📋 Overview

This project investigates Variants of Unknown Significance (VUS) in the BCS1L gene associated with Björnstad Syndrome — an extremely rare autosomal recessive mitochondrial disorder. Using evolutionary conservation analysis across 1000 homologous sequences, we developed a Python-based algorithm to classify each VUS as pathogenic or benign.

🧬 What is Björnstad Syndrome?

Björnstad Syndrome (BJS) is an extremely rare autosomal recessive mitochondrial disorder first described by Professor Roar Björnstad in 1965. It is caused by missense mutations in the BCS1L gene on chromosome 2q34–36, which encodes a mitochondrial chaperone protein responsible for assembling respiratory chain Complex III.

When mutated, the BCS1L protein impairs Complex III assembly and increases production of reactive oxygen species (ROS). Hair follicles and inner ear cells are particularly sensitive to mitochondrial dysfunction, leading to the two hallmark features of the syndrome:

  • 🦻 Sensorineural hearing loss — typically congenital and bilateral
  • 💇 Pili torti — twisted, brittle hair shafts, often leading to alopecia

🔬 Methodology

Step 1: Sequence Retrieval

  • Retrieved BCS1L protein sequence (isoform a) from ClinVar and gnomAD
  • Performed BLASTp searches with 100, 500, and 1000 homologous sequences
  • Selected 1000-hit BLASTp results for optimal species diversity and E-value reliability

Step 2: Sequence Alignment & Phylogenetic Analysis

  • Aligned 1000 homologous sequences using MUSCLE algorithm in MEGA11
  • Constructed Neighbor-Joining (NJ) phylogenetic tree for evolutionary analysis
  • Rerooted the tree using FigTree for improved interpretation

Step 3: VUS Identification

  • Retrieved 34 VUS positions from ClinVar and gnomAD databases
  • Mapped each VUS to aligned sequence positions using a custom Python algorithm

Step 4: Conservation Score Calculation

For each amino acid position, the algorithm:

  1. Reads the 1000-hit FASTA alignment file
  2. Counts the frequency of each amino acid at every position
  3. Computes a conservation score (CS) = count / total sequences
  4. Records the most (CS1) and second most (CS2) conserved amino acid per position

Step 5: Pathogenicity Classification

Three thresholds (t1=0.9, t2=0.7, t3=0.1) determine variant classification:

If variant == most conserved amino acid:
    CS1 > t1  →  Benign
    CS1 > t2  →  Benign
    CS1 < t2  →  Pathogenic

If variant ≠ most conserved amino acid:
    CS1 > t1  →  Pathogenic
    CS2 == variant and CS2 > t2  →  Benign
    CS2 == variant and CS2 > t3  →  Pathogenic
    CS2 == variant and CS2 < t3  →  Benign
    Otherwise  →  Pathogenic

📊 Results

Out of 34 VUS positions analyzed from gnomAD and ClinVar:

Classification Count
✅ Benign 8
⚠️ Pathogenic 26

Validation: 10 pre-classified variants from ClinVar were used to verify the algorithm — 8 out of 10 were correctly identified, demonstrating high classification accuracy.


🖼️ Figures

E-value Analysis

100-hit 500-hit 1000-hit
100-hit 500-hit 1000-hit

Sequence Alignment & Phylogenetic Tree

MUSCLE Alignment 1000-hit MUSCLE sequence alignment

Phylogenetic Tree Neighbor-Joining phylogenetic tree rerooted in FigTree

Conservation Analysis

Conservation Score Amino acid conservation scores across BCS1L positions

Conservation with Threshold Conservation scores with 0.5 threshold for region comparison

VUS Classification Results

Pathogenic vs Benign Pathogenic and benign VUS positions across the BCS1L gene


🛠️ Tools & Technologies

Tool Purpose
BLASTp Homologous sequence search
MEGA11 Multiple sequence alignment (MUSCLE) & phylogenetic tree construction
FigTree Phylogenetic tree visualization & rerooting
gnomAD / ClinVar VUS data retrieval
Python Conservation score algorithm & pathogenicity classification
RStudio Visualization of conservation scores and E-values

🚀 How to Use

Prerequisites

  • Python 3.x
  • MEGA11
  • FigTree
  • R / RStudio

Running the Classification Algorithm

# Clone the repository
git clone https://github.com/gizemdogafiliz/Bjornstad-Syndrome-BCS1L-Gene-VUS-Pathogenicity-Classifier.git

# Navigate to the directory
cd Bjornstad-Syndrome-BCS1L-Gene-VUS-Pathogenicity-Classifier

# Run conservation score calculation
python conservation_score.py

# Run pathogenicity classifier
python pathogenicity_classifier.py

Output

  • conservation_scores.tsv — Conservation scores for each amino acid position
  • pathogenic_or_benign.tsv — Pathogenicity classification for each VUS

📚 References

  1. Björnstad, R. (1965). Pili torti and sensory-neural loss of hearing. Proceedings of the 17th Meeting of the Northern Dermatological Society.
  2. Hinson, J. T., et al. (2007). Missense mutations in the BCS1L gene as a cause of the Björnstad syndrome. New England Journal of Medicine, 356(8), 809–819.
  3. Bénit, P., Lebon, S., & Rustin, P. (2008). Respiratory-chain diseases related to complex III deficiency. Biochimica et Biophysica Acta.
  4. Calvo, S. E., & Mootha, V. K. (2010). The mitochondrial proteome and human disease. Annual Review of Genomics and Human Genetics, 11, 25–44.
  5. Richards, S., et al. (2015). Standards and guidelines for the interpretation of sequence variants. Genetics in Medicine, 17(5), 405–424.
  6. Tamura, K., Nei, M., & Kumar, S. (2004). Prospects for inferring very large phylogenies by using the neighbor-joining method. PNAS.
  7. Kinene, T., et al. (2016). Rooting trees, methods for. Encyclopedia of Evolutionary Biology.

🔗 Useful Resources


👥 Team

Gizem Doğa Filiz, Yıldız Zeynep Şensan, Alpay Emir Aktan, Zeynep Tuana Anıç


Course: ENS 210 — Computational Biology
Institution: Sabancı University
Instructor: Asst. Prof. Dr. Ogün Adebali

About

Computational analysis and classification of Variants of Unknown Significance (VUS) in the BCS1L gene associated with Björnstad Syndrome. Implements a conservation score-based algorithm using BLASTp homologous sequences, MUSCLE alignment, and phylogenetic analysis to classify variants as pathogenic or benign.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors