Home » Blog » Docker Compose Healthchecks in the Homelab: Why Your Stack “Looks Up” but Isn’t Ready

Docker Compose Healthchecks in the Homelab: Why Your Stack “Looks Up” but Isn’t Ready

Tobias Mensah

May 9, 2026

Docker Compose Healthchecks in the Homelab: Why Your Stack

Green squares in Portainer feel like success. `docker ps` shows Up (healthy) and your uptime Kuma probe is green because something answered HTTP 200 on port 80. Then you reboot the host, watch services come back in random order, and discover your reverse proxy marked upstreams ready before databases finished recovery. The stack looked healthy; it was merely polite enough to crash later.

Healthchecks in Docker Compose exist to separate “process started” from “workload ready.” Homelabbers often copy snippets without aligning checks with real dependencies. This article walks through why that gap bites, how to write checks that match your apps, and where orchestration limits still leave holes on a single-node box.

If you only remember one sentence: a healthcheck should encode the same assumptions your dependent services make when they try to do real work, expressed as a small automated experiment that finishes in bounded time.

What a healthcheck actually gates

Docker runs your command or hits your endpoint on an interval. Success increments a counter until the container becomes healthy; failures increment until it becomes unhealthy. Compose v2 can use `depends_on` with `condition: service_healthy` so ordering waits on readiness—not just container birth. Without that wiring, Compose starts containers in dependency order but does not wait for them to pass checks unless you ask.

Reverse proxies are the classic casualty. Traefik or Caddy polls Docker labels and routes traffic the moment a port listens. If your app binds immediately but migrations take sixty seconds, users see 500s while Prometheus still insists the universe is fine because the exporter sidecar answered /metrics.

Monitor showing Docker Compose YAML with syntax highlighting

Bad healthchecks everyone copies

`CMD curl -f http://localhost/` on minimal images that do not ship curl. `wget` variants that silently follow redirects into login pages. Database checks that only run `pg_isready` without verifying the application database exists. These patterns look fine in tutorials; they rot when Alpine slimmed another megabyte out of the base image or when your init script creates the DB asynchronously.

Another anti-pattern is an ultra-aggressive interval that DDOSes your own app during startup storms. Start intervals generous, then tighten after you have observed real boot times under load—not on an idle NVMe after a fresh pull.

Writing checks that match reality

For HTTP services, hit a lightweight readiness path that exercises auth middleware if that is what breaks. For databases, pair `pg_isready` with a tiny query against a known table or extension version. For queues, confirm the broker accepts a publish and consume cycle in a scratch queue. The theme is the same: prove the subsystem your dependents need, not the cheapest green light.

Terminal logs showing container restart errors during debugging

Compose wiring you should memorize

Use `healthcheck` blocks with explicit `interval`, `timeout`, `retries`, and `start_period`. The start period is your grace window while JVMs warm, SQLite WAL replays, or NFS mounts thaw. Omit it and a slow-but-healthy container flaps to unhealthy before first success, causing restart loops if something external reacts to the state.

`depends_on` with health conditions is Compose-local magic; it does not exist in plain `docker run`. Swarm and Kubernetes have different primitives—do not assume files port cleanly when you graduate off the homelab.

Single-node reality checks

On one machine, disk IO spikes during scrubs or backups can push latency past your healthcheck timeout even though the app is fine. Either widen timeouts during maintenance windows or pause checks via documented runbooks. Also remember healthchecks run inside the container network namespace—localhost is correct for binding checks; hitting the public hostname might accidentally traverse hairpin NAT and lie about external reachability.

Exec form versus shell form (and why it matters)

Docker supports `CMD` style health commands in exec JSON form or shell string form. Shell form inherits a shell, variable expansion, and subtle signal-handling differences. When your check is `curl || wget`, you are already in brittle territory—pick one binary, install it in the image, and exec it directly. Fewer moving parts mean fewer “works on my laptop” moments when BusyBox ash disagrees with bash.

Rootless Podman users crossing streams

If you migrated some stacks to rootless Podman with compose compatibility, healthchecks behave similarly but networking and user namespaces introduce new edge cases—binding privileged ports, reaching systemd user session dbus, or sharing unix sockets for databases. Verify paths and permissions; a check that worked rootful may fail rootless with permission denied that looks like a dead service.

SELinux and AppArmor: silent health killers

Security modules can block curl from reading certs or reaching unix sockets even when the app works because it uses a different label. Audit logs are your friend; do not disable SELinux wholesale—adjust contexts or choose checks that match the confined profile. Homelab does not mean “security off,” it means “security surprises during movie night.”

Windows and Docker Desktop caveats

WSL2-backed Docker introduces another VM boundary. Localhost inside the container is not the same as localhost on the Windows host; file share latency can stretch startup. If you develop on Windows and deploy to Linux, revalidate timings; copy-pasted start_period values may be fiction.

How Kubernetes differs (without starting a migration flamewar)

Kubernetes readiness probes map closely to Docker healthchecks but run at the orchestration layer with different defaults and backoff strategies. If you are learning K8s after Compose, carry the mental model—prove readiness, separate liveness from readiness—but do not assume identical YAML keys. Homelab k3s clusters still benefit from the same discipline: a Deployment marked Available is not proof your migrations finished.

Testing healthchecks in CI (yes, even solo)

A short compose smoke test in GitHub Actions or Gitea CI can bring up the stack, wait for healthy states, then curl the public route. It will not catch every race, but it catches “we deleted curl from the slim image last month.” Cheap insurance against regressions when you tweak base images during security bumps.

Observability: health is necessary, not sufficient

Combine Docker health with application metrics and synthetic transactions. A container can be healthy while returning wrong data because the migration never ran. Black-box probes from Uptime Kuma hitting the public URL catch that class of failure; internal health endpoints catch others. Budget for redundancy in signals, not just redundancy in containers.

Logs should tell a story when health flips. Ship structured JSON if you can; grep-friendly lines if you cannot. When a probe fails, print why—timeout versus connection refused versus HTTP 418 jokes from a misconfigured router. Future you is debugging at 1 a.m.; kindness in log text compounds.

Restart policies interact with health in messy ways

`restart: unless-stopped` plus a flaky healthcheck equals thrash. If your command is wrong, Docker will keep restarting forever while your SSD writes SMART anxiety into the counters. During development, disable healthchecks temporarily or set `disable: true` in overrides files rather than commenting out entire services. Git-tracked compose overrides (`docker-compose.override.yml` ignored in git) are a homelab staple for this reason.

Networking traps on consumer hardware

IPv6 dual-stack weirdness can make curl to localhost succeed from inside the container while external IPv4 clients hit a black hole, or vice versa. If your healthcheck only tests one stack, label that assumption in comments. CGNAT and hairpinning on home routers add another layer: internal checks pass while friends cannot reach your Plex—different failure class, same lesson about what “healthy” asserts.

Resource limits change timing

CPU throttling from mis-set `cpus:` or memory pressure from absent `mem_limit` can stretch startup past your start_period. After tuning cgroups, revisit health timings. Autoscaling is rare on a homelab, but “my backup container stole all the IOPS” is not.

When to skip fancy checks

Static sites and stateless sidecars may only need a trivial TCP check. Over-engineering health for a read-only nginx adds latency and log noise. Save rigor for stateful tiers and anything that participates in auth or billing.

Failure drills worth running twice a year

Reboot the host intentionally. Pull the power on the UPS test if you dare. Watch compose bring stacks up and note which service flaps. Fix ordering and checks while you remember which log lines mattered. Disaster rehearsals are cheaper when your boss is a cat, not a CFO.

Closing habit

Treat every healthcheck like a contract with your future tired self: specify what “ready” means, give startups breathing room, and wire dependents to wait on the contract. Your dashboard will stay green for the right reasons—and when it goes red, you will trust it enough to wake up.

Template you can steal and adapt

For a typical web plus database stack, database first: `pg_isready` with adequate start_period, then app service with an HTTP GET to `/healthz` that returns JSON including build SHA and migration version. Proxy depends on app with `condition: service_healthy`. Exporters depend on their targets being reachable, not merely alive. Adjust names, paths, and drivers; keep the structure.

Document the expected boot timeline beside the compose file—humans forget winter when kernels update and JVM flags change. A sticky note in Git beats heroic memory.

Ship the compose file you tested, not the aspirational one you meant to test next weekend.