Skip to content

Cloud Storage Access (GCS/S3) Has No Retry, Backoff, or Timeout Configuration #1303

@khushthecoder

Description

@khushthecoder

Description

All remote data access (GCS via gcsfs, S3 via s3fs, and HTTP via fsspec) relies entirely on the default timeout/retry behavior of the underlying filesystem libraries. The codebase has zero explicit timeout, retry, or backoff configuration — confirmed by searching the entire malariagen_data/ directory for these terms.

How to Reproduce

import malariagen_data
ag3 = malariagen_data.Ag3()

# On a slow/unreliable network (e.g., field station in sub-Saharan Africa):
# This will hang indefinitely if GCS is unreachable, with no timeout:
ds = ag3.snp_calls(region="3R", sample_sets="3.0")

# No retry on transient 503/429 errors from GCS:
# A single failed request causes the entire operation to fail.

Why It Is Important

  • MalariaGEN's primary user base includes researchers in malaria-endemic regions (sub-Saharan Africa, Southeast Asia) where internet connectivity can be unreliable.
  • The library accesses multi-GB zarr stores over the network; a single transient error (GCS 503, network timeout, DNS failure) causes the entire operation to fail with a cryptic OSError.
  • gcsfs and s3fs support configurable retries, timeout, and retry_delay parameters that are simply never set.
  • The base.py constructor at lines 100-105 already silently swallows network errors from ipinfo — the same pattern of "hope the network works" pervades the entire data access layer.
  • Adding retries=3, timeout=60 to the filesystem initialization in util.py would cover all downstream operations.

Expected Impact After Resolution

  • Transient network errors are automatically retried with exponential backoff.
  • Operations have configurable timeouts (with sensible defaults).
  • Users on unreliable connections get clear timeout errors instead of indefinite hangs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions