Skip to content

dask.config.set() at Module Import Time in ag3.py Silently Modifies Global Dask Configuration #1308

@khushthecoder

Description

@khushthecoder

Description:

ag3.py line 10 executes the following statement at module import time:

dask.config.set(**{"array.slicing.split_native_chunks": False})

This runs the moment any code imports malariagen_data, since ag3.py is loaded during package initialization. It silently disables a Dask performance optimization globally, affecting all Dask operations in the entire Python process — not just those belonging to malariagen_data.


Steps to Reproduce

import dask
print(dask.config.get("array.slicing.split_native_chunks"))  # True (default)
 
import malariagen_data  # Side effect triggered here!
 
print(dask.config.get("array.slicing.split_native_chunks"))  # False — globally changed
 
# Now ANY dask array operation in this process uses non-split behavior,
# even operations completely unrelated to malariagen_data:
import dask.array as da
x = da.random.random((10000, 10000), chunks=(1000, 1000))
result = x[::2]  # Uses non-split slicing due to the global config change

Root Cause

The call to dask.config.set() is made at module scope (top-level of ag3.py), rather than inside a context manager scoped to the specific operations that require it. As a result, importing malariagen_data is sufficient to permanently alter the Dask configuration for the entire Python session.


Why This Is a Problem

1. Unexpected Side Effects

Importing a library should not modify the global configuration of other libraries. This violates the principle of least surprise and breaks the assumption that import statements are side-effect-free.

2. Performance Impact

split_native_chunks was introduced to improve performance for certain slicing operations. Disabling it globally may degrade performance for non-malariagen_data Dask workloads running in the same session (e.g., xarray, pangeo, custom pipelines).

3. Difficult to Debug

Researchers combining malariagen_data with other Dask-based tools will observe changed Dask behaviour with no visible indication of the cause. The configuration change is silent, persistent, and attributed to nothing in tracebacks or logs.


Expected Behavior

The dask.config.set() override should be scoped only to the specific operations that require it, using the context manager form:

with dask.config.set(**{"array.slicing.split_native_chunks": False}):
    # perform the specific operation that needs this config
    ...

This ensures the configuration is restored to its previous value immediately after the operation completes, with no lasting effect on the rest of the process.


Proposed Fix

Before (current behaviour — ag3.py, line 10):

# Module-level — runs on import, permanently changes global config
dask.config.set(**{"array.slicing.split_native_chunks": False})

After (proposed fix — scoped to specific methods):

# Inside the method(s) that need this behaviour
def some_method(self, ...):
    with dask.config.set(**{"array.slicing.split_native_chunks": False}):
        # operation that requires this config
        ...

Expected Impact After Resolution

  • Importing malariagen_data will no longer modify the global Dask configuration.
  • The split_native_chunks override will be active only during the specific operations that require it.
  • Third-party Dask-based tools used alongside malariagen_data (e.g., xarray, pangeo) will no longer have their performance characteristics silently altered.
  • The fix is fully backward-compatible — no changes to public API or user-facing behaviour are required.

Affected File

File Line Code
ag3.py 10 dask.config.set(**{"array.slicing.split_native_chunks": False})

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions