Why we bundle every model in the box.

May 10, 2026 Sarvesh Chidambaram View source

Abstract dot matrix motif in raspberry on graphite.

You already pay for inference

You are paying for a Claude plan. Or a ChatGPT plan. Or both. Maybe you run Ollama on your machine. Maybe a teammate has an early OpenCode account.

Every design-AI product on the market wants you to pay them, additionally, for inference you have already paid for once. That is the standard business model: pick a vendor, mark up their tokens, lock the workflow to the markup. The math does not work for you, and the lock-in is the point.

Mémoire calls the model you tell it to. It does not host one. It does not resell one. The CLI, the macOS Studio app, and the MCP server all run on your machine; the only network calls are the AI providers you configure. The harness is local-first, and the inference call is yours.

The trap on our side of the table

Refusing vendor lock-in creates a different temptation, and this one belongs to the builder: claim “works with any model” by listing logos, while only one path is actually load-bearing. Logo-strip neutrality. Most multi-model tools are one good integration wearing seven costumes.

The only honest version of bring-your-own-model is to treat every model path as a first-class engineering target. So that is what the box contains.

What bundling a model actually means

Studio ships eight harness definitions in its manifest: Memoire Native, Claude Code, Codex, OpenCode, Gemini CLI, Ollama, Hermes, and a plain shell. Bundled does not mean a dropdown entry. Each definition carries:

An install probe plus setup steps, so the app can tell you exactly what is missing and hand you the copyable command.
An auth probe, a real command that proves the harness is signed in before the UI claims it is ready.
An output parser for that harness’s stream dialect, because Codex JSONL, Claude stream-JSON, and Hermes plain text agree on nothing.
A working default, tuned per harness: Codex defaults to GPT-5.5 Design, Ollama defaults to llama3.1:8b. A bundled harness should do something useful before you touch a setting.

Visibility is tiered too. Codex and Claude Code ride up front, enabled by default. OpenCode, Gemini, Ollama, Hermes, Memoire Native, and Shell sit one layer back as advanced options. Bundling everything does not mean showing eight tabs at once, and because the tiers live in manifest data, promoting a harness is a data change, not a redesign.

Keys follow policy, not vibes

The part of multi-model support nobody writes about: environment handling. An API-key provider, a local model, and a raw shell should not share one secrets story.

So each harness declares an envPolicy: provider, local-model, safe-inherit, or shell. Provider harnesses get key handling. A local model never sees your keys because it never needs them. The shell inherits a sanitized environment. Bring-your-plan only stays safe when the harness knows what kind of plan it was brought.

Hermes runs the other direction

One detail that captures the stance: the Hermes setup flow includes memi agent install hermes, which installs a Memoire-shipped skill kit into a third-party agent.

Most tools only integrate inward; they consume other people’s models. We also ship outward. If the agent you already use can be made better at design work, it gets the skills. The boundary worth defending is the workspace and the memory, never the model brand.

The HUD is the receipt

If users bring their own paid plans, then this tool spends their money, and hiding that spend would be hostile. Studio’s cost HUD polls usage every 3 seconds and shows tokens, dollars, and prompt-cache hit rate, color-coded: green at fifty percent cache hits and above, amber at twenty, red below.

That last number is the honest one. In agent workloads, cache-hit rate is the gap between a cheap run and an expensive one, and it measures our context management, not your prompting. Putting it in the chrome means that when the harness wastes your plan, the harness wears the red.

The manifesto part

This is OpenCode for designers, and the home page says it with a runtime strip: Claude, Codex, Hermes, OpenCode, Ollama, OpenClaw, Cursor. That section shipped with the site’s first three blog posts (commit 2b45ec1, 2026-05-07) and was rebuilt as a bento grid the very next day, logos rotated algorithmically per nth-of-type for a hand-laid feel. The graphics changed in a day. The stance did not.

Don’t let one vendor decide how you design. Use the model you have. Move when a better one ships. Take your work with you.

The lesson

Vendor neutrality is not a toggle. It is a budget you pay per harness: a probe, a parser, a policy, a default. We pay it eight times over, and the cost HUD shows you what you spend in return.

When a tool claims to support every model, ask to see its equivalent of the output-parser list. If no such list exists, it supports one model and tolerates the rest.

If you want to add leverage, the Notes repo is where it compounds: each Note is a graduated workflow any model can run. The more Notes, the more the harness matters, and the less the vendor does.