Skip to content

Commit 9bf82a9

Browse files
committed
feat: warn when cohort downsampling occurs (fixes #912)
When n_samples > max_cohort_size, the dataset is randomly downsampled without notification. This adds a UserWarning explaining the original and new sample counts, and how to disable downsampling. No change to logic, defaults, or returned data.
1 parent a3d4463 commit 9bf82a9

1 file changed

Lines changed: 7 additions & 0 deletions

File tree

malariagen_data/anoph/snp_data.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
import warnings
12
from functools import lru_cache
23
from typing import Any, Dict, List, Optional, Tuple, Union
34

@@ -1253,6 +1254,12 @@ def _snp_calls(
12531254
if max_cohort_size is not None:
12541255
n_samples = ds.sizes["samples"]
12551256
if n_samples > max_cohort_size:
1257+
warnings.warn(
1258+
f"Cohort downsampled from {n_samples} to {max_cohort_size} "
1259+
"samples. Set max_cohort_size=None to disable downsampling.",
1260+
UserWarning,
1261+
stacklevel=2,
1262+
)
12561263
rng = np.random.default_rng(seed=random_seed)
12571264
loc_downsample = rng.choice(
12581265
n_samples, size=max_cohort_size, replace=False

0 commit comments

Comments
 (0)