Karkhana: A workshop for solo software work

I run three machines, ten or so project ideas in active rotation, and one production app with paying customers. I do this alone.

I call this practice Karkhana — कारखाना, Marathi for workshop. The goal is straightforward: build tools and utilities that are useful to me. Some of these solve problems specific to my workflow; others turn out to have generic use cases and graduate to external users. The production app with paying customers started the same way — something I built for myself that turned out to be useful to others.

What makes this sustainable now, when it wouldn’t have been a year ago, is that AI agents handle the parts that used to need a team. I can scaffold a system, set up infrastructure, and iterate on ideas at a pace that wasn’t realistic before. The freedom to have my workflows exactly the way I want them — not the way a SaaS product decided I should have them — is the whole point.

This post establishes what Karkhana is, the quality standard I hold it to, and the principles that keep it running. Future posts go deep on specific decisions: TLS for tailnet services, three-tier backups, decision records for solo developers, the quirks of Colima on macOS. Each stands alone, but they all live inside this same frame.

The quality baseline

I’ve been writing software professionally for two decades. That shapes how I think about the things I run, even when I’m the only user. I’m pragmatic — I don’t need six nines of availability — but I have a higher baseline for quality and reliability than I’d accept from a hobby project. Things break, and when they break, I waste time. The purpose of automation is utility. If something isn’t useful, I don’t build it or deploy it. But if it is useful, it gets production-grade care, because I depend on it.

The rubric I use for anything that runs in Karkhana:

No data loss, ever. Data I care about is backed up in at least two independent failure domains. A single machine failure, a single provider outage, or a single human error must not result in irreversible data loss. This is non-negotiable.

Full restore within a weekend. If any single machine dies, I should be able to replace it and restore all services and data within a weekend. Not a weekend of frantic work — a weekend of mostly-waiting with a clear runbook. If I can’t, the design isn’t done.

Documented decisions. Every non-obvious choice has a Decision Record explaining what I considered, what I picked, and why the alternatives lost. This is for future-me, who will have forgotten the reasoning and will be tempted to re-litigate.

Tested recovery. Backups are tested quarterly — restore a file, diff it against the live source. A backup that hasn’t been restored is just a story I’m telling myself.

Legible after six months away. I should be able to step away from Karkhana for months, come back, and pick up where I left off without spelunking through commit history or reverse-engineering my own setup. Every component has a reason, every reason is written down, every breakage has a known recovery path.

Replaceable components. No single provider, framework, or machine is irreplaceable. Every component has at least one alternative I could swap in within a day. This isn’t paranoia — it’s how I keep the system honest.

These aren’t aspirational. They’re the minimum bar. Anything that doesn’t meet them either gets fixed or gets removed.

Why now

The work of running infrastructure used to look like this: read a tutorial, copy commands, hit an error the tutorial didn’t anticipate, search for a half-relevant answer, try a few things, eventually fix it, forget what I did, hit the same error months later.

The work of running infrastructure now looks like this: describe what I’m trying to do, an AI agent asks clarifying questions, I make decisions, it generates the commands. When something breaks, I paste the error, it suggests likely causes, I verify. The loop is faster by an order of magnitude.

This isn’t a pitch for AI. It’s a description of what changed. The cost of running real infrastructure as one person dropped from “weekend project” to “afternoon task.” The threshold for self-hosting fell below the threshold for paying SaaS for many things.

That changed the calculus.

The three principles

The quality rubric above sets the floor. These three principles shape the architecture:

Self-hosted where it makes sense. Self-hosting used to be ideological. Now it’s economic. The marginal cost of running my own Gitea, my own Coolify, my own backup pipeline is two hours of setup and a few euros a month of compute. The cost of equivalent SaaS is higher, growing forever, with someone else’s policy changes I have to react to.

That said, some things are genuinely better as services. I pay for Notion because it’s good at what it does and I don’t want to maintain a wiki backend. I use hosted LLM models because running intelligent models locally isn’t economical yet. I pay for email hosting because the risk of managing it myself isn’t worth the savings. Same for the public-facing VPS and cloud storage — the operational risk of self-hosting those is too high for the payoff. The principle isn’t “self-host everything.” It’s “self-host where the math works and the risk is manageable.” For everything else, I pay happily.

Restorable by default. This overlaps with the rubric but is worth calling out as a design principle, not just a checklist item. Every system decision is filtered through the question: “when this breaks, how do I get back?” If the recovery path is unclear or untested, the design isn’t finished.

Keep optionality. Don’t lock a workload to a machine. Don’t lock a project to a domain. Don’t lock yourself into one provider for everything. The systems I admire most aren’t the cleverest; they’re the most replaceable.

What’s in Karkhana

Three machines, by role:

A small VPS at a European host runs the things that must be up 24/7: my Git server, a deployment platform for public apps, and a public wiki. This is the boring layer. Boring is good. It runs the same things every day, has external monitoring, and a one-click rollback I pay a few euros a month for.

A workhorse Mac at home runs the heavy private stuff: a deployment platform for internal apps, CI runners, a development environment. It’s six and a half years old. It will die one day. Everything important on it is replicated, backed up, or documented well enough to rebuild on a new machine within a weekend.

A new Mac mini sits next to it, deliberately under-utilized. It’s my insurance policy. When the old Mac dies, I move workloads to this one and order a replacement. I bought it before I had a use for it because finding and provisioning hardware is the slowest part of recovery. Better to have the machine waiting than to make recovery dependent on a shopping trip.

Three domains, by purpose. One is the active workshop. One is the legacy zone where my public wiki lives. One is the brand for a project that graduated.

Code lives in self-hosted Gitea, mirrored to GitHub for backup only. Notion holds the connective tissue: the project portfolio, the decision records, the runbooks, the backlog. The wiki you’re reading is git-backed and public.

In practice, this took two weekends to set up and now largely maintains itself.

What this is not

This is not a startup. There is no team, no funding, no exit strategy. There are paying customers on one app and probably will be on others, but profitability isn’t the primary driver. The point is that I want to build things, and I want the things I build to work reliably — for me, and for anyone else who finds them useful.

This is also not minimalism. People who write about solo development often advocate for radical simplicity: one server, one stack, one domain, one repo. That works for them. But I have many ideas, and I find them by building. Forcing all my projects into one box would make me build fewer of them. The infrastructure exists so that adding the eleventh project doesn’t break the first ten.

The discipline isn’t simplicity. It’s legibility.

What’s next in this series

Some technical posts coming up:

Three-tier backup for a one-person shop, on what I back up, how, and why three independent failure domains.
Caddy to Tailscale serve, on the quiet TLS win for tailnet-only services on macOS.
Decision Records for solo developers, on why DRs matter more for solo work, not less.
Colima on macOS quirks, on the things I wish I’d known before I started.

Each of these is a piece of Karkhana. Each is also useful on its own if you don’t care about my whole setup and just want the specific lesson.

Karkhana is the practice. The infrastructure is just the floor it stands on.