Skip to content

Resolute Raccoon Stemcell does not work on a mac under rosetta #577

@mkocher

Description

@mkocher

Resolute stemcells currently under development don't work under vz/rosetta. Documenting why in this issue in case the details ever change.

An arm warden stemcell might be nice in the future, though largely only useful for bosh development as it would be incompatible with many releases.

AI Summary of the Issue

Attempting to run or test an x86_64 Ubuntu Resolute BOSH stemcell under a Rosetta 2-backed Linux VM results in fatal ENOSYS (Function not implemented) errors during standard process supervision and container execution.

This prevents monit from properly restarting and blocks the BOSH Process Manager (BPM) from starting jobs.

Root Cause Analysis

The failure stems from a fatal intersection between Apple's Rosetta 2 system call translation and the aggressive deprecation of legacy cgroup code in modern systemd.

  1. Systemd v256 Strict Syscall Requirements: Ubuntu 26.04 ships with systemd v256+. Starting with this release, the systemd maintainers removed legacy fallback mechanisms for managing cgroups (such as reading cgroup.procs and iterating standard kill() syscalls). Systemd now strictly mandates the use of modern process file descriptor syscalls—specifically pidfd_open and pidfd_send_signal—to eliminate PID recycling race conditions.
  2. The Rosetta 2 Emulation Gap: Apple's Rosetta 2 acts as a translation layer between the x86_64 container userland and the arm64 Linux kernel running in the vz VM. Rosetta 2 must explicitly map and translate system calls between the two architectures. It currently does not implement translations for the pidfd syscall family.
  3. The ENOSYS Wall: When systemd or runc fires a pidfd syscall, Rosetta 2 intercepts it, doesn't know how to translate it to the ARM kernel, and instantly returns -1 ENOSYS (Function not implemented). Because systemd v256+ no longer has a fallback loop for this failure, it bubbles the error up and aborts the operation.

Observed Symptoms

This missing syscall mapping manifests in two primary ways during stemcell/BOSH operations:

1. systemctl kill Failures
When scripts attempt to kill/restart services, systemctl kill relies on pidfd_send_signal to terminate the cgroup. It fails with:

Failed to kill unit monit.service: Failed to send signal SIGTERM to processes in unit cgroup '/system.slice/monit.service': Function not implemented

2. BPM / runc Failures
When BPM attempts to start a job, runc asks systemd to create a transient .scope unit and passes an array of PIDs. Systemd attempts to secure those PIDs using pidfd_open. It fails with:

level=error msg="runc run failed: unable to start container process: unable to apply cgroup configuration: unable to start unit \"runc-bpm-hello-world.scope\" (...): Failed to set unit properties: Function not implemented"

Conclusion & Workarounds

This is a fundamental architectural limitation of using Rosetta 2 to emulate modern x86_64 Linux userlands. Older Ubuntu versions (like 24.04 Noble) work on Rosetta solely because their older systemd versions (v255) retain dirty fallbacks to mask the emulator's missing syscalls.

Workarounds:

  • Run integration tests and stemcell builds on native x86_64 hardware.
  • If testing on Apple Silicon is strictly required, switch the virtualization engine to use full QEMU CPU emulation instead of Rosetta 2 (e.g., colima start --arch x86_64 --vm-type qemu), which provides a 100% complete x86_64 kernel API at the cost of significant performance degradation.

Note: Opened for tracking and documentation purposes regarding Resolute stemcell development environments.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Waiting for Changes | Open for Contribution

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions