Preserve sample_query_options during cohort query normalisation

### Summary
`_setup_cohort_queries()` still drops `sample_query_options` when it re-applies cohort queries during cohort-size validation, so parameterised sample filters like `country in @countries_list` fail in any API that relies on shared cohort normalisation.

### Why this matters
This is the same correctness family as #995 / #999, but at a deeper layer: the recent caller-level forwarding fixes in #996 and #1001 do not cover the helper that actually rebuilds and rechecks cohort queries.

As a result, higher-level analyses that accept `sample_query_options` still break for valid pandas query context such as:
- `local_dict`
- `global_dict`
- `resolvers`

That matters for multi-cohort analysis surfaces because it makes query behaviour inconsistent depending on which code path is used, even when the public signature advertises `sample_query_options` support.

### Reproduction
Using the simulated-data APIs locally, both of these currently raise `pandas.errors.UndefinedVariableError: local variable 'countries_list' is not defined`:

```python
countries_list = ["Angola", "Mali"]

api.pairwise_average_fst(
    region="2L",
    cohorts="country",
    sample_query="country in @countries_list",
    sample_query_options={"local_dict": {"countries_list": countries_list}},
    min_cohort_size=1,
)

api.plot_h12_gwss_multi_overlay(
    contig="2L",
    cohorts="country",
    window_size=200,
    sample_query="country in @countries_list",
    sample_query_options={"local_dict": {"countries_list": countries_list}},
    min_cohort_size=1,
    show=False,
)
```

The failure happens inside `malariagen_data/anoph/sample_metadata.py` when `_setup_cohort_queries()` loops over derived cohort queries and calls:

```python
self.sample_metadata(sample_sets=sample_sets, sample_query=cohort_query)
```

without passing `sample_query_options`.

### Additional mismatch in the same query contract
Once `sample_query_options` is forwarded into the shared helper, `sample_metadata()` also needs to treat `engine="python"` as a default rather than forcing it as a duplicate keyword argument. The current implementation documents `engine` as a supported query option, but paths that now correctly forward `sample_query_options={"engine": "python"}` can trip:

```python
TypeError: DataFrame.query() got multiple values for keyword argument 'engine'
```

### Proposed fix direction
1. Pass `sample_query_options` through the cohort-size validation calls inside `_setup_cohort_queries()`.
2. Normalize query options in one shared place so `engine="python"` remains the default, but an explicitly supplied engine does not become a duplicate keyword.
3. Add regression tests at both the metadata layer and at least one multi-cohort public API surface (e.g. H12/Fst) using `local_dict`.

### Impacted files
- `malariagen_data/anoph/sample_metadata.py`
- `malariagen_data/anoph/base.py`

This looks like a good small-but-deep correctness fix because one shared abstraction affects multiple analysis entry points.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preserve sample_query_options during cohort query normalisation #1100

Summary

Why this matters

Reproduction

Additional mismatch in the same query contract

Proposed fix direction

Impacted files

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Preserve sample_query_options during cohort query normalisation #1100

Description

Summary

Why this matters

Reproduction

Additional mismatch in the same query contract

Proposed fix direction

Impacted files

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions