Skip to content

sample_indices parameter missing from multiple methods that internally call sample_metadata #1126

@noir4201

Description

@noir4201

Hello mentors,
While exploring the malariagen-data-python API in preparation for a GSoC 2026 proposal, I noticed a parameter inconsistency across several public methods.
Problem
sample_metadata() accepts a sample_indices parameter that allows users to select specific samples by their index position in the dataset. However, four other public methods — count_samples(), plot_samples_interactive_map(), plot_samples_bar() and plot_sample_location_geo() — all call sample_metadata() internally but do not expose sample_indices to the caller. This means a user who wants to work with a specific subset of samples selected by index can do so with sample_metadata() but cannot use the same selection with any of these four methods.
Evidence this is an oversight
wgs_data_catalog() is another method that calls sample_metadata() internally and it already correctly exposes and passes through sample_indices. This confirms the pattern exists and the four methods above were simply missed.
User impact
Users cannot count, plot for a sample subset selected by index, even though the underlying sample_metadata() fully supports it. There is no workaround within the existing API.
Proposed fix
Add sample_indices as an optional parameter to the signatures of all four affected methods, with a default value of None. Then pass it through to the internal sample_metadata() call in each method , exactly the same way wgs_data_catalog() already does it.

I'd be happy to work on a PR for this if it seems useful to contribute as part of my Gsoc proposal
Best regards,
noir4201

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions