Skip to content

Probe attaching: do we really need a dense representation? #4553

@h-mayorquin

Description

@h-mayorquin

@samuelgarcia

I've been thinking more about whether SpikeInterface needs a dense channel-aligned representation. The two arguments I've heard for it are ergonomics and making slicing and aggregation easier. Let me go through them.

For channel-position indexing

The idea is that a dense channel-aligned contact_vector lets get_channel_locations resolve locations by channel position with one numpy index. Here is how it looks today on main (baserecordingsnippets.py#L348-L366):

def get_channel_locations(self, channel_ids=None, axes: str = "xy") -> np.ndarray:
    if channel_ids is None:
        channel_ids = self.get_channel_ids()
    channel_indices = self.ids_to_indices(channel_ids)
    contact_vector = self.get_property("contact_vector")
    if contact_vector is not None:
        ndim = len(axes)
        all_positions = np.zeros((contact_vector.size, ndim), dtype="float64")
        for i, dim in enumerate(axes):
            all_positions[:, i] = contact_vector[dim]
        positions = all_positions[channel_indices]
        return positions
    else:
        locations = self.get_property("location")
        if locations is None:
            raise Exception("There are no channel locations")
        locations = np.asarray(locations)[channel_indices]
        return select_axes(locations, axes)

This makes sense and looks compact, but this is the only place in SpikeInterface where a dense channel-aligned representation is used for numpy-like indexing by channel position. Bringing the full dense-array machinery into ProbeGroup (as probeinterface#420 proposes) feels like too much infrastructure for one function.

Reviewing the code base, I found that every other reader of contact_vector is work the format creates for itself. get_probegroup rebuilds the ProbeGroup via from_numpy only because the dense form is what gets stored; storing the ProbeGroup object directly (as #4548 does) makes that read disappear. The other big user is slicing and aggregation, discussed below, which is also bookkeeping the format creates for itself.

And if we ever do want a dense view, probeinterface#425 already gives us a way. With it, get_channel_locations looks like this:

def get_channel_locations(self, channel_ids=None, axes: str = "xy") -> np.ndarray:
    if channel_ids is None:
        channel_ids = self.get_channel_ids()
    channel_indices = self.ids_to_indices(channel_ids)
    if self.has_probe():
        contact_vector = self._probegroup._build_contact_vector()
        ndim = len(axes)
        all_positions = np.zeros((contact_vector.size, ndim), dtype="float64")
        for i, dim in enumerate(axes):
            all_positions[:, i] = contact_vector[dim]
        return all_positions[channel_indices]
    locations = self.get_property("location")
    if locations is None:
        raise Exception("There are no channel locations")
    locations = np.asarray(locations)[channel_indices]
    return select_axes(locations, axes)

Which is equivalent and can be used wherever the need for such a slicing pattern arises.

And the id-keyed property-mapping version in #4548 is not that complicated either.

For making slicing and aggregation operations easier

The other cases where I thought dense would help are slicing and aggregation, but as I started looking I discovered several bugs there: #4549, #4545, #4546, #4547. There are even ones we thought we had solved but we haven't: the planar_contour is still lost when splitting and aggregating channels.

The bugs come from three things combining:

  • Separating the data (contact_vector) from the probe metadata (annotations).
  • Using positional integer indices (probe_index, device_channel_indices) both as internal structure and as identifiers — which means they collide the moment you combine two probegroups or two recordings, and become stale the moment you slice one.
  • Throwing away the mapping by resetting device_channel_indices = np.arange(n) everywhere.

Those are problems of provenance, and the current dense channel-ordered representation makes them worse: it destroys device_channel_indices on every reshape and doubles down on identifying mappings by index rather than by id. The proposal in #4548 keeps the original Probe untouched and carries the channel-to-contact mapping as an id-keyed record on the recording, which gets us simpler code in both reshape sites:

Slicing

Current on main (channelslice.py#L64-L68):

# change the wiring of the probe
contact_vector = self.get_property("contact_vector")
if contact_vector is not None:
    contact_vector["device_channel_indices"] = np.arange(len(channel_ids), dtype="int64")
    self.set_property("contact_vector", contact_vector)

My proposal in #4548 (channelslice.py#L64-L67):

# inherit the probegroup by reference; the `wiring` per-channel property
# rode through copy_metadata above with the filtered channel_ids
if parent_recording.has_probe():
    self._probegroup = parent_recording._probegroup

Aggregation

Current on main (channelsaggregationrecording.py#L92-L117):

for prop_name, prop_values in property_dict.items():
    if prop_name == "contact_vector":
        # remap device channel indices correctly
        prop_values["device_channel_indices"] = np.arange(self.get_num_channels())
    self.set_property(key=prop_name, values=prop_values)

# if locations are present, check that they are all different!
if "location" in self.get_property_keys():
    location_tuple = [tuple(loc) for loc in self.get_property("location")]
    assert len(set(location_tuple)) == self.get_num_channels(), (
        "Locations are not unique! " "Cannot aggregate recordings!"
    )

planar_contour_keys = [
    key for recording in recording_list for key in recording.get_annotation_keys() if "planar_contour" in key
]
if len(planar_contour_keys) > 0:
    if all(
        k == planar_contour_keys[0] for k in planar_contour_keys
    ):  # we add the 'planar_contour' annotations only if there is a unique one in the recording_list
        planar_contour_key = planar_contour_keys[0]
        collect_planar_contours = []
        for rec in recording_list:
            collect_planar_contours.append(rec.get_annotation(planar_contour_key))
        if all(np.array_equal(arr, collect_planar_contours[0]) for arr in collect_planar_contours):
            self.set_annotation(planar_contour_key, collect_planar_contours[0])

My proposal in #4548 (channelsaggregationrecording.py#L97-L120):

if all(rec.has_probe() for rec in recording_list):
    first_pg = recording_list[0]._probegroup
    if all(rec._probegroup is first_pg for rec in recording_list):
        combined_probegroup = first_pg
    else:
        combined_probegroup = ProbeGroup()
        for rec in recording_list:
            for probe in rec._probegroup.probes:
                # workaround for probeinterface #421; becomes `probe.copy()` once fixed
                probe_copy = Probe.from_dict(probe.to_dict(array_as_list=False))
                # required to pass probeinterface's cross-probe dci uniqueness check;
                # dci is a local index, the canonical wiring lives on the recording's
                # `wiring` property, so provenance is preserved
                probe_copy.set_device_channel_indices(
                    np.full(probe_copy.get_contact_count(), -1, dtype="int64")
                )
                combined_probegroup.add_probe(probe_copy)
    self._probegroup = combined_probegroup

Given this, I'm unsure why we'd want a general dense representation. If we need comfortable slicing, #425 covers it. If a hot loop ever needs dense later, a local cache at that call site would do. Probes are not large and building a dense representation on demand is cheap either way.

Is there any other use of a dense representation that I am missing?

Related: #4397

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions