Summary
The PLINK export methods currently generate output filenames from only a subset of their input parameters. As a result, different export requests can resolve to the same filename prefix even when they produce different data.
This affects:
biallelic_snps_to_plink()
biallelic_snps_ld_pruned_to_plink()
Problem
At the moment, the filename prefix is built from parameters such as region-level SNP filtering, but it excludes several parameters that also affect the exported dataset contents, including:
sample_sets
sample_query
sample_query_options
sample_indices
site_mask
random_seed
Because these parameters are not reflected in the filename, two calls with different sample-selection inputs can write to the same .bed/.bim/.fam prefix.
Example collision
Two calls such as:
api.biallelic_snps_to_plink(
output_dir="/plink",
region="2L",
n_snps=1000,
sample_sets=["set_a"],
random_seed=42,
)
and
api.biallelic_snps_to_plink(
output_dir="/plink",
region="2L",
n_snps=1000,
sample_sets=["set_b"],
random_seed=42,
)
can currently generate the same output prefix, despite exporting different sample cohorts.
Why this is a problem
- Different exports can silently overwrite each other.
- Cached/exported file paths are not a stable identifier of dataset contents.
- The risk increases when users run multiple exports into the same
output_dir.
Summary
The PLINK export methods currently generate output filenames from only a subset of their input parameters. As a result, different export requests can resolve to the same filename prefix even when they produce different data.
This affects:
biallelic_snps_to_plink()biallelic_snps_ld_pruned_to_plink()Problem
At the moment, the filename prefix is built from parameters such as region-level SNP filtering, but it excludes several parameters that also affect the exported dataset contents, including:
sample_setssample_querysample_query_optionssample_indicessite_maskrandom_seedBecause these parameters are not reflected in the filename, two calls with different sample-selection inputs can write to the same
.bed/.bim/.famprefix.Example collision
Two calls such as:
and
can currently generate the same output prefix, despite exporting different sample cohorts.
Why this is a problem
output_dir.