Skip to content

Add As1 support#1257

Merged
tristanpwdennis merged 23 commits intomalariagen:masterfrom
tristanpwdennis:add-As1
Apr 15, 2026
Merged

Add As1 support#1257
tristanpwdennis merged 23 commits intomalariagen:masterfrom
tristanpwdennis:add-As1

Conversation

@tristanpwdennis
Copy link
Copy Markdown
Collaborator

See #706

@tristanpwdennis
Copy link
Copy Markdown
Collaborator Author

TODO: stage and release data, then write tests

@adilraza99
Copy link
Copy Markdown
Contributor

Hi @tristanpwdennis, the As1 integration seems to follow the existing patterns well.

While going through tests/anoph/test_snp_data.py, I noticed that the amin1 import and its amin1_sim_api fixture appear to have been replaced by the As1 equivalents rather than added alongside them. The case_amin1_sim test still references amin1_sim_api, so I was wondering if this might be contributing to one of the CI failures.

In some of the other test files, it looks like the existing species fixtures are kept and new ones are added alongside them. Would that be the intended approach here as well?

@tristanpwdennis
Copy link
Copy Markdown
Collaborator Author

Good spot - thanks @adilraza99, I will fix these

@tristanpwdennis
Copy link
Copy Markdown
Collaborator Author

@jonbrenas @ahernank - testing has hit an issue. As discussed we didn't generate contamination QC data for the samples. The tests fail as they are expecting comtam_pc and contam_llc columns.

Rather than restructure the tests, I've added dummy data to the columns in the fixture. My rationale for this is that in the future, we will be able to include contamination data in sample set QC. Let me know if there are any issues around this.

@jonbrenas
Copy link
Copy Markdown
Collaborator

Hi @tristanpwdennis, yes, adding dummy data is the correct way to go. To be fair, all the data in the fixture is dummy data.

@tristanpwdennis
Copy link
Copy Markdown
Collaborator Author

Looks like tests are passing (for now)

Comment thread malariagen_data/as1.py Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.85%. Comparing base (c2c4f73) to head (c9e8202).
⚠️ Report is 17 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1257      +/-   ##
==========================================
- Coverage   88.92%   88.85%   -0.07%     
==========================================
  Files          55       55              
  Lines        6257     6273      +16     
==========================================
+ Hits         5564     5574      +10     
- Misses        693      699       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tristanpwdennis
Copy link
Copy Markdown
Collaborator Author

Hmm. Unable to reproduce this offline. Will investigate.

@adilraza99
Copy link
Copy Markdown
Contributor

Hi @tristanpwdennis,

It looks like the CI failure in test_allele_frequencies_with_dict_cohorts[adir1_sim] might be happening when aa_allele_frequencies() returns an empty DataFrame (i.e. no amino acid change SNPs for the selected transcript). In that case, the result eventually reaches np.isnan, which then raises the TypeError.

I noticed a few other test blocks in the same file already guard this case, for example L437, L572, L653.

It seems this block (around L709–L716) does not have that check yet. Maybe it would make sense to handle it the same way here?

@tristanpwdennis
Copy link
Copy Markdown
Collaborator Author

tristanpwdennis commented Apr 7, 2026

@adilraza99 - thank you! I think this issue may be random and unrelated to As1 class - probably something to do with the annotations in the dirus assembly but we should open a separate issue about this and possibly implement the checks you suggest.

Copy link
Copy Markdown
Collaborator

@jonbrenas jonbrenas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that the fixture needs the curation or the cohorts for all the sample sets, only the 4 (which is already a lot, Ag has 3) that are actually listed.

Also only 1365-VO-DJ-ADBI-VMF00318 has surveillance flags (and the file is empty) which sounds like a missed opportunity to test that they work fine.

The rest seems correct. @tristanpwdennis, did you test every function in the rst? Because the functions are tested on simulated data, some functions that have their input from GCS directly fail even when all tests pass.

@tristanpwdennis
Copy link
Copy Markdown
Collaborator Author

Thanks @jonbrenas!

I've had a tidy of the the fixture data - deleting the ghost surv flags (see sep discussion). I also added a flag allowing the dip clustering tests to be performed on classes without cnv data (eg as1, adir1, amin1). Think the .rst is covered now.

Copy link
Copy Markdown
Collaborator

@jonbrenas jonbrenas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest looks OK. Hopefully, the tests would catch something going horribly wrong.

Comment thread README.md Outdated
@@ -1,89 +1,15 @@
# `malariagen_data` - analyse MalariaGEN data from Python
# Curation metadata
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea but it's been banished now

Copy link
Copy Markdown
Collaborator

@jonbrenas jonbrenas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a few copy/paste issues left, I think.

Comment thread malariagen_data/as1.py Outdated
Access data from Google Cloud Storage (default):

>>> import malariagen_data
>>> adir1 = malariagen_data.As1()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adir1 should be as1. Same on the next line.

@tristanpwdennis
Copy link
Copy Markdown
Collaborator Author

Is there anything outstanding left on this @jonbrenas and @ahernank ?

Copy link
Copy Markdown
Collaborator

@ahernank ahernank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tristanpwdennis! Happy to merge.

@tristanpwdennis tristanpwdennis dismissed jonbrenas’s stale review April 15, 2026 22:07

Have resolved, thanks

@tristanpwdennis tristanpwdennis merged commit 92a4366 into malariagen:master Apr 15, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants