Three-tier backup for a one-person shop

A backup you’ve never restored is not a backup. It’s a feeling.

This post is about how I back up my home server as a solo developer with paying customers, what I learned setting it up, and the specific design choices I’d recommend to anyone in a similar position. The goal is not enterprise-grade. The goal is: when the Mac dies, I’m back in four hours, and I haven’t lost anything I cared about.

The threat model

What am I actually protecting against?

Hardware death. The Mac running my home server is six years old. It will die. SSDs fail, motherboards fail, and at some point it will refuse to boot.

Accidental destruction. I rm -rf the wrong directory. I push a bad migration. I delete a file I needed.

Provider failure. My VPS host has a multi-day outage. Or bans my account by mistake. Or has a data centre fire.

Compromise. Someone gets shell on a machine and tries to encrypt or delete my data.

These four scenarios have different shapes. A backup design that protects against all four needs more than one tier. Having three independent failure domains turned out to be the right number — fewer is risky, more is overkill for one person.

The architecture

Three tiers, each a different failure domain.

Tier 1: hourly to a cloud sync provider (in my case, Google Drive — I already pay for the storage). Restic encrypts everything client-side and writes to a folder that the Drive client syncs. I get hourly snapshots, deduplicated, encrypted, replicated to Google’s servers as a side effect of the sync. When I rm something by accident, this is what I restore from. The Drive sync gives me passive geo-replication for free.

Tier 2: daily to a separate cloud provider (in my case, a cheap storage box at my VPS host). Restic over SFTP. This is my offsite, daily, with longer retention — 30 daily, 12 weekly, 12 monthly snapshots. When the Mac and Drive both go bad somehow, this is what I restore from.

Tier 3: whole-machine snapshots of the VPS (a managed product from the same provider, €4/month). This protects only the VPS, not the Mac. It exists because the VPS hosts code (Gitea), and rebuilding Gitea by hand would be slow and error-prone.

The three tiers cost me about €8/month in additional spending (the Drive plan I already had, the storage box, and the VPS backup). For one person with paying customers on at least one app, this is appropriate insurance.

Why three, not two

The instinct is “Drive plus offsite, that’s two, that’s fine.” It isn’t fine, and the reason is subtle.

Drive sync is fast for restoring a file. It’s slow and unreliable for restoring everything. If I lose the Mac and the Drive client uploads were lagging by a few hours, I might have a partial state on Drive that’s inconsistent with itself — half of yesterday’s snapshot, half of today’s. The Drive client is not a transactional backup tool; it’s a sync tool that I’m using as a backup tool. For full restores, I want a snapshot-aware tool talking to a snapshot-aware target.

That’s Tier 2. The storage box is dumb storage, but restic is snapshot-aware, so I get atomic per-snapshot restore. When I’m rebuilding the Mac from zero, this is what I’d actually use.

Tier 3 (VPS whole-machine backup) is independent because the VPS itself has services I don’t want to rebuild — my Git server, my deployment platform. If those go corrupt or get attacked, I’d rather one-click rollback the whole VM than reconstruct from restic. Restic can recover the data; the VPS backup recovers the machine plus the data plus the configuration plus everything I forgot to back up specifically.

Different failure modes, different recovery paths. Hence three tiers.

What I back up

The tricky part of backups isn’t the mechanism. It’s deciding what’s worth backing up.

For my home server, the answer is everything that holds state I can’t recreate:

Application data directories — Coolify configs, deployed app data
Service configurations — /etc/cloudflared/, launchd plists, Colima VM configs
SSH keys
Database dumps from anything inside the VM (taken at backup time, not the live volume — live database volumes don’t survive copy-while-running)

What I don’t back up:

Operating system files — easier to reinstall macOS than to restore it
Docker images — re-pullable from the registry
Source code — already in Git, three places

The principle: back up the things that exist nowhere else, that took human decisions to produce, and that would take real time to rebuild.

For each long-running service, I take a database dump as part of the backup script — pg_dump for Postgres, equivalent for whatever else. I write the dump to a local file, then restic snapshots that file along with everything else. When I restore, I have a consistent dump from the time of backup, not a corrupt half-copy of a live volume.

Encryption and keys

Restic encrypts everything client-side with a passphrase. The remote storage — Drive, the storage box — sees only encrypted blobs.

This matters for a few reasons:

If my Drive account is breached, the attacker gets blobs they can’t read. If the storage box host is compromised, same. If the VPS provider gets subpoenaed and hands over my data, they hand over encrypted blobs.

I use a different passphrase per repository. Drive uses one passphrase, storage box uses another. If one passphrase leaks, only one tier is decrypted. Both passphrases live in my password manager.

For the SSH key that authenticates restic to the storage box: I use per-source keys — one key per machine that pushes backups, not one shared key. If the Mac is compromised and an attacker gets the key, they can read or destroy backups from the Mac. They cannot read or destroy backups from any other machine I have configured against the same storage box, because each of those uses its own key. Blast radius containment matters even in a one-person setup, because compromise often happens through one specific machine — a phishing click in your browser, a bad dependency in one project — and you want the damage to stay where it started.

The schedule

Backups run on launchd timers (you’d use cron on Linux, systemd timers on a more modern box, anacron if your machine isn’t always on).

Hourly tier: every hour at :15.
Daily tier: once a day at 03:30.

Retention policies are different per tier:

Hourly tier: keep 24 hourly + 7 daily. Recent granularity, recent history.
Daily tier: keep 30 daily + 12 weekly + 12 monthly. Less granular, longer history.

I never set “keep all snapshots forever.” Storage growth is real even with deduplication, and old snapshots from before a major data-shape change are usually less useful than a clean snapshot from after.

The single most important step

Quarterly, I do a test restore. I pick one snapshot from each tier, restore a known file to a temp location, and diff it against the live source. If the diff is empty, the backup works. If not, I have a real problem to debug before I actually need it.

I have this on my calendar four times a year. I encourage anyone reading this who runs production-ish workloads to put it on theirs.

This is the single most important step in the whole post. Backups that have never been restored are stories you tell yourself. The first time you discover yours don’t work should not be the moment you need them.

What this looks like in practice

The mechanical reality, for a sense of scale:

The backup script is about 60 lines of shell. It SSHes into the VM where my containers run, takes database dumps, copies the dumps to a host-side directory, then runs restic backup against either the Drive or storage box repo depending on the tier passed as argument.

A daily backup of my home server pushes about 70 KiB of new data to the storage box, end-to-end in 10 seconds. Restic deduplicates well, so even after months, total storage size grows slowly. The storage box is 1 TB, costs €4/month, and would take literal years to fill at this rate.

The hourly Drive backup is even faster because it’s writing to a local folder; the Drive client uploads asynchronously. I never wait on it.

When I tested my first restore, it took six seconds to pull a config file from the latest snapshot and verify it byte-for-byte matched the source. That moment — when diff returned no output — is what convinced me the design was right.

What I’d do differently for someone else

If you’re setting this up for the first time, my recommendations:

Don’t pick exotic tools. Restic is mature, well-documented, and works on every OS. Borg is equivalent — pick one and stick with it. Don’t try to write custom rsync scripts for backup; that path leads to corrupt restores.

Pick three failure domains. They don’t have to be the same as mine. Your three could be: external SSD plus a friend’s NAS plus a cloud bucket. The point is three, not the specific three.

Encrypt client-side. Always. Don’t trust the storage provider to encrypt for you.

Per-source keys. One key per machine pushing to a shared destination. Five minutes more setup, much smaller blast radius.

Test restore on day one. Before you trust the schedule, run a manual backup and a manual restore. Diff the result. Then schedule.

Test restore quarterly forever. This is the one thing people skip and regret.

Keep the script small. Mine is around 60 lines, mostly because I resisted the temptation to add features. The simpler the script, the easier it is to debug when something goes wrong.

The point

The work to set up genuine three-tier backup for a solo project is not enormous — a focused afternoon — and it solves a problem that only gets more expensive the longer you wait. Any project with real users should have backups by week two.

Set them up early. Test them often. Trust them only after a successful restore.

This post is part of the Karkhana series, about running solo software work sustainably with high-leverage AI agents. See also: Decision Records for solo developers.