Skip to content

Blog

Shipping a macOS agent app without mystery crashes.

Abstract steadying pulse line in raspberry on graphite.

Studio kept dying with what looked like an OS memory fault. No stack trace into our code, no JavaScript error, just a dead app. I spent real time suspecting the React layer because that is where all the recent diffs were, and that instinct was completely wrong.

The crash that was not in our code

Studio is a Tauri app, but the product depends on a bundled sidecar runtime, a native executable the app spawns at launch. During development we routinely launched Studio from a build folder on an external SSD. macOS can invalidate executable mappings when the backing volume is rebuilt, moved, or force-unmounted. The app shell survives that. The sidecar does not: when the mapped binary disappears underneath a running process, the failure presents as a memory fault, not as anything a frontend log would explain.

That diagnosis changed the shape of the fix. There was no React rewrite that could help. The runtime had to stop executing from a volume we did not control.

One day, ten commits

The stabilization landed as a burst on 2026-05-13, ten commits in one day. The anchor was a 672-line change to src-tauri/src/lib.rs titled “stabilize macos runtime launch” (commit 8e6e35d). The same day brought a dedicated runtime supervisor, 228 lines (commit 7295132), and managed replacement of orphaned runtimes, another 330 lines (commit abf8446).

I want to be honest about what those numbers mean. Over 1200 lines of Rust in a day is not elegance, it is triage. The elegance came later, in the hardening wave. But the triage established the architecture that survived: materialize, supervise, attach or replace.

Materialize, then spawn

On launch, Studio resolves the bundled binary and packaged resources, then copies them into a stable local cache:

~/Library/Application Support/cv.memoire.studio/runtime/<fingerprint>/

The sidecar is spawned from that cache, never from the bundle location, with MEMOIRE_PACKAGE_ROOT pointed at the cached package root. When the bundle changes, the fingerprint changes and the cache refreshes. Development from an external drive now behaves exactly like an installed app, because by the time the process starts, the install location is irrelevant.

This flow kept getting more literal over the following weeks. On 2026-05-26 the spawn was pinned so the working directory itself is the cached package (commits a4b21ad, 8633be6), and on 2026-06-06 materialization moved fully inside the package root (commit fd2f45f). Today lib.rs is 3518 lines, with an explicit materialize_runtime_in_cache flow and runtime_cache_is_ready checks. The file is large because the operating system has that many ways to interrupt a process, and we stopped pretending otherwise.

Owning a process before killing it

Port 8765 can already be occupied when Studio starts. The lazy fix is to kill whatever holds the port. That is also how a desktop app destroys an unrelated process on someone else’s machine.

So the supervisor writes a managed pid file containing the pid, a token, the cache path, the workspace, and the start time. On launch or restart, Studio replaces only a runtime it can prove it owns through that file. Otherwise it attaches to the existing runtime or reports the conflict honestly. Orphans get adopted or replaced with evidence, never guessed at.

Shutdown got the same discipline. Studio sends a normal termination signal first, waits briefly, and force-kills only if needed. One specific deadlock taught us to never hold the runtime lock while waiting on the child process: with the lock held, quit and restart could wedge each other permanently. Boring shutdown is engineered, not assumed.

The long tail nobody tweets about

The mystery crash was fixed in May’s second week. The reliability work was not done for another three weeks, and the long tail is where the durable lessons live. Between 2026-05-26 and 05-27 a recovery wave landed: readiness recovered from cached sidecars (commit 83adab4), wedged starts recovered (6d4cc7e), duplicate launches attached to a runtime that was still starting instead of racing it (44ba4e2), stalled relaunches prevented (33aa006), and the runtime preserved during workspace recovery (a504ac0).

The same window made the cache trustworthy offline (commits 6926092, c66ba75), added tests for the cache state metadata itself (614ed22), and pinned the runtime to v0.18.4 with startup hardening (commit 3567baf, 2026-05-27). Every one of those commits is a small answer to the same question: what does the app do when the happy path has already failed?

What counts as fixed

The release gate for this work was never “the window opens”. The proof list we hold Studio to:

  • frontend build passes, Rust tests and clippy pass
  • the runtime reports status: running from the materialized cache
  • Codex completes a live Studio session, and so does Claude Code
  • quit and restart leave no unmanaged sidecars behind

The lesson I keep from this arc: in an agent app, process lifecycle is product code. Launch, spawn, attach, and quit are features users feel on the first run, and they fail in OS-shaped ways that no amount of UI care can mask. We got a stable app the week we started treating the sidecar with the same seriousness as the interface, and the crash never came back.