Why my Colima LaunchAgent refused to load: a Sequoia provenance story

The setup

I run a headless Intel i9 Mac at home as a private compute workhorse. It hosts Coolify (via Colima), Linux CI runners, and a few personal services. The machine is accessed only over SSH and Tailscale. No keyboard, no monitor in active use.

I had a backlog item flagged from the day I built the thing: validate Colima autostart after a real reboot. The deferred bit was the worry: if launchd autostart fails when nobody is at the machine, there is no recovery path remotely.

This week I rebooted, and Colima did not come up.

What launchd told me

$ colima status
FATA[0001] colima is not running

$ launchctl list | grep -i colima
(nothing)

The plist files were on disk in ~/Library/LaunchAgents/, but launchd had never loaded them. So I tried to bootstrap them by hand:

$ launchctl bootstrap gui/501 ~/Library/LaunchAgents/com.user.colima.plist
Bootstrap failed: 125: Domain does not support specified action

That one I understood. I was SSH’d in, so the GUI domain was not reachable. Switched to the user domain:

$ launchctl bootstrap user/501 ~/Library/LaunchAgents/com.user.colima.plist
Bootstrap failed: 5: Input/output error
Try re-running the command as root for richer errors.

sudo gave the same error 5 with no extra detail. The plist parsed cleanly under plutil -lint. Permissions were standard 0644. Ownership correct. No syntax issues.

The clue I almost missed

In a wide ls -la I noticed an @ after the permission string:

-rw-r--r--@ 1 sachin staff 730 29 Apr 19:58 com.user.colima.plist

The @ means extended attributes. xattr -l showed:

com.apple.provenance:

com.apple.provenance is macOS Sequoia’s tracking attribute for files written by certain processes. Editors that save via their own write paths (Cursor, VS Code, anything Electron-based with sandboxed filesystem APIs) tag files with it.

I tried to clear it:

$ xattr -c com.user.colima.plist
$ xattr com.user.colima.plist
com.apple.provenance

It came back. Tried sudo xattr -c. Same. Tried xattr -d com.apple.provenance. Same.

On Sequoia, ~/Library/LaunchAgents/ is a SIP-protected directory. macOS auto-attaches com.apple.provenance to any file that lands there, and the attribute cannot be stripped by user-space tools. Even files created via cat > file <<EOF heredoc inside that directory pick it up the moment they appear on disk.

That is what was failing the bootstrap. launchd refuses provenance-tagged plists in protected locations as a tampering defense, and surfaces it as the unhelpful error 5.

The fix

The agent that lets a process write to that directory without tripping the defense is Full Disk Access. My SSH session’s shell did not have FDA. Once it did, the same launchctl bootstrap call worked first try.

$ launchctl bootstrap user/501 ~/Library/LaunchAgents/com.user.colima.plist
$ launchctl bootstrap user/501 ~/Library/LaunchAgents/com.chaos.colima-tunnel.plist
$ launchctl list | grep -E 'colima|chaos'
801     0       com.chaos.colima-tunnel
-       0       com.user.colima

Reboot. Colima up. Tunnel up. Coolify recovered. Done.

The real story is older

The provenance error is the surface symptom. The interesting part is why I was fighting launchd at all.

When I set up Colima a couple of weeks ago, I picked Apple’s Virtualization.framework (vz) over QEMU. The reasoning at the time, captured in the decision record, was straightforward: vz is faster, well-integrated, and virtiofs mounts beat the default 9p performance significantly. For my workload — Coolify, future Linux CI runners, the occasional ad-hoc container — vz looked like the better default.

Here is what I did not appreciate then: vz requires a GUI session. It is not a server-style hypervisor that can run as root in a daemon context. It hooks into macOS’s user session for graphics, virtualization device handoff, and other framework dependencies. Try to start a vz-backed Colima from a LaunchDaemon and it will not work.

QEMU has no such constraint. With QEMU, I could have written one LaunchDaemon, set RunAtLoad=true, and walked away. Boot to login screen, Colima starts as root regardless, no auto-login required, no GUI session needed. That would have been the end of the autostart story before it began.

By choosing vz, I committed myself to:

Auto-login enabled (so a GUI session exists at boot)
LaunchAgent (not LaunchDaemon) for Colima itself
Therefore plists in ~/Library/LaunchAgents/
Therefore the provenance attack surface that bit me this week

None of this was visible at decision time. The vz vs QEMU comparison reads like “vz is faster, done.” The cascading consequences only show up on the day you reboot.

When I would still pick vz

I still think vz was the right call for this specific machine, for these specific workloads. Performance matters here — Coolify is doing real work, the CI runner pool is going to be doing real work, and the I/O delta between vz and QEMU at scale is meaningful.

But I would phrase the decision differently now. The tradeoff is not “vz is faster than QEMU.” The tradeoff is:

vz: faster runtime, requires GUI session, complicates lifecycle (LaunchAgent + auto-login + provenance), harder to recover from remotely
QEMU: slightly slower, runs as a daemon, dead simple lifecycle, recoverable from anywhere

For a workhorse you reboot rarely and access remotely, I would still take vz. For a CI-only box where performance is fine either way and uptime matters more than throughput, QEMU’s daemon-friendliness is probably the better answer. Worth thinking about case by case rather than reflexively picking the faster one.

Three things to remember

If you take nothing else from this:

vz means LaunchAgent means GUI session means auto-login. This chain is invisible until it breaks. If you are picking a hypervisor for a headless Mac, ask yourself whether the perf delta justifies the lifecycle complexity.
com.apple.provenance and SIP-protected directories interact with launchd in ways that surface as opaque error 5. If you see a “5: Input/output error” from launchctl bootstrap, check xattr before you check anything else. The fix is Full Disk Access for the calling process.
gui/<uid> versus user/<uid> matters when you are SSH’d in. Bootstrap fails with error 125 on gui from an SSH session because the GUI domain is not reachable. Use user/<uid> instead, or bootstrap from a Terminal session at the actual machine.

What I am changing

The backlog row is now closed. Two follow-ups landed in the tracker:

The Colima → tunnel race is real but currently benign (the tunnel’s KeepAlive=true plus ThrottleInterval=5 retries cheaply enough that it always settles). Worth fixing properly with a wrapper script that polls colima status first, but not urgent.
The session also surfaced a Cloudflare API token sitting in plaintext in a now-deprecated Caddyfile and duplicated in ~/.zshrc. Token revoked. New backlog row to define a proper secrets management pattern across all my machines, since .zshrc is the wrong home for credentials.

The reboot test itself, the thing the backlog row was actually about, passed cleanly. Cold start, no intervention, full stack up. That part feels good.