Caddy to Tailscale serve: the quiet TLS win for tailnet services

#tailscale #caddy #macos #tls #homelab #karkhana

I spent the better part of an evening trying to make Caddy serve a wildcard TLS certificate for tailnet-only services on a headless Mac. I got the cert. I never got TLS to work. Then I deleted Caddy, ran one Tailscale command, and was done in two minutes.

This is a short post about a long debugging session, and the lesson I should have internalized earlier: if a tool already does TLS for you in the context where you need TLS, don’t fight macOS to do it again with a custom domain.

The setup I wanted

A Mac at home, headless, accessible only over my Tailscale tailnet. I wanted to put a few private services behind a clean wildcard subdomain — *.home.example.com — with valid TLS certs. The standard recipe for this is Caddy with the Cloudflare DNS plugin, using DNS-01 challenges to issue a wildcard cert. Caddy reverse-proxies to the actual services running on internal ports.

This recipe works on Linux. The internet is full of guides for it. I had every reason to expect it to work on macOS too.

What went wrong

Each individual problem is small. Together they were a wall.

Port 443 binding requires root on macOS. Caddy as a regular user can’t bind 443. The standard fix is to run it as root via a launchd service.

LaunchAgent vs LaunchDaemon, and SIP. macOS 15 has tightened launchd. Unsigned LaunchDaemons can’t bootstrap into the system domain — System Integrity Protection rejects them. The fix is a LaunchAgent in the user domain, which can run with elevated privileges via setuid-style configuration. Except a LaunchAgent runs as the user, which brings us back to the port 443 problem.

Cert storage permissions. I worked around the privilege issue by running Caddy as root in a LaunchDaemon-equivalent setup. Now Caddy as root tried to read certs from ~/Library/Application Support/Caddy/, which my user owns with drwx------. Root can’t read that. I chmod 755-ed the cert directory.

The cert never loaded into the runtime cache. Even after fixing permissions, the TLS handshake silently failed with SSL_ERROR_SYSCALL. Caddy’s logs showed certificate already exists in storage but the actual handshake never presented a cert. I tried wiping all Caddy state, moving storage paths, restarting from scratch. The symptom never went away. Best guess: Caddy’s autosaved configuration kept a sticky reference to the old storage path, and the runtime cache was reading from somewhere different than where the issued cert actually lived. I never confirmed this theory because by then I had been at it for hours.

I wrote up the debugging in my decision records, gave up, and went looking for a different approach.

The two-minute fix

Here’s what I should have tried first:

tailscale serve --bg --https=443 --set-path=/ http://localhost:8000

That’s the whole thing. The service is now reachable at https://machine.tailnet-name.ts.net with a valid TLS cert that Tailscale manages on my behalf. No Caddy. No wildcard cert. No Cloudflare API token. No launchd anything.

Tailscale’s serve command runs an HTTPS reverse proxy on port 443, terminates TLS using Let’s Encrypt certs that Tailscale provisions for the *.ts.net namespace, and forwards requests to whatever local port I tell it to. It only listens on the tailnet interface, so the world can’t reach it — only my Tailscale-connected devices can. The cert lifecycle is handled entirely by Tailscale; I never see a .pem file.

For multiple services, I add path-based routes:

tailscale serve --bg --https=443 --set-path=/coolify http://localhost:8000
tailscale serve --bg --https=443 --set-path=/agent   http://localhost:18789

Or I use Tailscale’s per-port routing. Either works.

What I gave up

The honest tradeoff: I no longer have URLs like coolify.home.example.com. I have URLs like machine.tailnet-name.ts.net/coolify. The .ts.net URL is uglier. It also bakes in my tailnet name, which feels like leakage but isn’t really — only my devices can resolve it anyway.

For tailnet-only services, this aesthetic loss is the entire cost. There’s no functional difference. I bookmark the URLs once. I rarely type them.

For genuinely public services, I do still want clean custom domains. Those go through Cloudflare Tunnel, which has none of the macOS port-binding pain because the tunnel is outbound — cloudflared connects to Cloudflare’s edge and Cloudflare handles inbound TLS at their edge. Cloudflare also offers free SSO via Access for putting auth in front of admin dashboards. That part of my setup works flawlessly.

The lesson

Tools that already handle TLS in their native domain are not worth fighting with a different tool to get a custom domain.

Tailscale gives me TLS for free on the tailnet. It does this well, with auto-renewal, with a one-line config. The reason I tried to use Caddy instead was that I wanted a custom domain on the tailnet. That want had no functional basis. It was aesthetic.

The general rule, which I’ll keep applying to other parts of Karkhana: identify what’s already solved by the tool in front of me, and don’t reinvent that part. Build on top, don’t compete underneath.

The custom domain wasn’t worthless. It just belongs to a different layer — the public layer, served by Cloudflare Tunnel. By trying to use one wildcard cert for both private and public services, I was conflating two networks with two different threat models and two different ergonomic needs. Once I separated them — .ts.net for private, custom domain for public — both layers became simple.

When this advice doesn’t apply

A few caveats, because nothing is universal:

If you need a custom domain on the tailnet for some genuine reason — single-sign-on policies that require it, integrations that don’t accept .ts.net names, an organizational mandate — then you do need the Caddy-style approach. Or you use Tailscale Funnel for public exposure with a .ts.net URL, which doesn’t help here either.

If you’re not on macOS, half my problems disappear. Caddy on Linux is genuinely easy. If you’re on a Linux home server, the original recipe probably works. The macOS-specific friction was the multiplier on the cost side.

If you don’t already use Tailscale, this isn’t a reason to switch. It’s just a reason to use Tailscale’s built-in serve if you do.

Where to go from here

If you’re setting up a similar headless machine, the order I now recommend:

  1. Get Tailscale on the box.
  2. Use tailscale serve for everything tailnet-only.
  3. Use Cloudflare Tunnel for anything genuinely public, with Cloudflare Access in front of admin dashboards.
  4. Don’t run a reverse proxy at all unless you have a specific reason the above two don’t cover.

This setup costs me nothing per month beyond the domain registration and is operated mostly by services I don’t have to maintain.

This post is part of the Karkhana series, about running solo software work sustainably with high-leverage AI agents.