Hello mentors,
While exploring the malariagen-data-python API in preparation for a GSoC 2026 proposal, I noticed a parameter inconsistency across several public methods.
Problem
sample_metadata() accepts a sample_indices parameter that allows users to select specific samples by their index position in the dataset. However, four other public methods — count_samples(), plot_samples_interactive_map(), plot_samples_bar() and plot_sample_location_geo() — all call sample_metadata() internally but do not expose sample_indices to the caller. This means a user who wants to work with a specific subset of samples selected by index can do so with sample_metadata() but cannot use the same selection with any of these four methods.
Evidence this is an oversight
wgs_data_catalog() is another method that calls sample_metadata() internally and it already correctly exposes and passes through sample_indices. This confirms the pattern exists and the four methods above were simply missed.
User impact
Users cannot count, plot for a sample subset selected by index, even though the underlying sample_metadata() fully supports it. There is no workaround within the existing API.
Proposed fix
Add sample_indices as an optional parameter to the signatures of all four affected methods, with a default value of None. Then pass it through to the internal sample_metadata() call in each method , exactly the same way wgs_data_catalog() already does it.
I'd be happy to work on a PR for this if it seems useful to contribute as part of my Gsoc proposal
Best regards,
noir4201
Hello mentors,
While exploring the malariagen-data-python API in preparation for a GSoC 2026 proposal, I noticed a parameter inconsistency across several public methods.
Problem
sample_metadata() accepts a sample_indices parameter that allows users to select specific samples by their index position in the dataset. However, four other public methods — count_samples(), plot_samples_interactive_map(), plot_samples_bar() and plot_sample_location_geo() — all call sample_metadata() internally but do not expose sample_indices to the caller. This means a user who wants to work with a specific subset of samples selected by index can do so with sample_metadata() but cannot use the same selection with any of these four methods.
Evidence this is an oversight
wgs_data_catalog() is another method that calls sample_metadata() internally and it already correctly exposes and passes through sample_indices. This confirms the pattern exists and the four methods above were simply missed.
User impact
Users cannot count, plot for a sample subset selected by index, even though the underlying sample_metadata() fully supports it. There is no workaround within the existing API.
Proposed fix
Add sample_indices as an optional parameter to the signatures of all four affected methods, with a default value of None. Then pass it through to the internal sample_metadata() call in each method , exactly the same way wgs_data_catalog() already does it.
I'd be happy to work on a PR for this if it seems useful to contribute as part of my Gsoc proposal
Best regards,
noir4201