Migrating from Kubernetes to Docker Compose
A year of Kubernetes⌗
About a year ago I thought it would be a good idea to learn a bit more about Kubernetes. We use Kubernetes as part of our server orchestration at work, and while most of it is abstracted a way, it rarely hurts to know how the various foundational layers are actually like.
At the time, I tried to set up a three-node cluster (two computers at home, one in the cloud), connected together by Tailscale (i.e. via Wireguard). This… kind of worked, but it was super chatty, and when we moved to Seattle we no longer had an unmetered internet connection.
The original setup used microk8s
because it was the first option that came
up, so to resolve the issue, I just ran a separate microk8s node for the cloud
machine (battery
) and for the home machine (potato
). I’d gotten
cert-manager to automatically provision LetsEncrypt certificates, it was easy
to deploy new containers; I thought things were great.
Over the course of the year, I learned a bunch of things that I kind of wish I didn’t need to learn:
- Random bits about
calico
and Kubernetes' internal networking abstractions - The fact that Kubernetes runs as a bunch of eventually consistent control loops, so if you Really Need something to just start, it’s actually annoyingly hard.
- There aren’t clean logs, anywhere.
- Sometimes the cluster just reschedules a pod, even though there’s only one node to schedule the pod on. So your maximum availability isn’t necessarily high.
- The storage provisioning system (
PersistentVolumes
andPersistentVolumeClaims
) is really hard to reason about. I assume the idea here is to let your cloud vendor deal with this for you using their network-backed storage, but it was really common for this to be the thing that kept a pod from starting. - Kubernetes is full of certificates, and sometimes they expire. So there’s a whole song and dance to get them refreshed so things work again.
- …
Some bits of it were pretty cool, though. I liked that I could just define a Dockerfile and it would get deployed onto the internet without me needing to handwrite configuration files. The Honeycomb agent system is pretty cool. Automatic SSL configuration (and in general syncing the configuration between the service and the frontend proxy) was very convenient.
What can I use instead⌗
I had a few requirements for the next service orchestration thing:
- It needs to use containers, because encapsulation makes things easier
- It needs to integrate cleanly with a proxy that can route requests to the
right place (e.g.
nginx
ortraefik
) - It needs to automatically manage SSL for me
- It needs to support putting a given path/virtualhost behind an OAuth barrier
- It needs to keep working if I don’t look at it for a few months…
- It should consume as little CPU and RAM as possible
Notably, I’m not running any critical infrastructure on these boxes, so there’s no real need for high availability. Requirement (4) suggested that I avoid looking at any distributed orchestration systems.
docker compose
seemed to fit the bill: it’s just a fancy script for Docker
configuration, and traefik
supports service discovery via Docker labels,
which meets the first few requirements.
Doing the migration⌗
I don’t run any particularly stateful applications, so the actual migration was a process of figuring out how to write the appropriate compose file for the applications I cared about.
The overall architecture is pretty simple: there’s a bridge network
traefik_proxy
, which most of the apps run on, and then traefik
itself is on
that network and additionally has exposed ports 80 and 443 for HTTP and HTTPS.
traefik_proxy:
name: traefik_proxy
driver: bridge
ipam:
config:
- subnet: 192.168.90.0/24
traefik
itself is configured to use my Cloudflare Zone key for LetsEncrypt
DNS verification by setting the two environment keys
CF_API_EMAIL=$CLOUDFLARE_EMAIL
CF_DNS_API_TOKEN=$CLOUDFLARE_API_KEY
and passing in the appropriate command-line arugments. We also tell traefik
to use Docker to find the services, though we need to specify the ports
manually.
--certificatesResolvers.default.acme.email=$CLOUDFLARE_EMAIL \
--certificatesResolvers.default.acme.storage=/acme.json \
--certificatesResolvers.default.acme.dnsChallenge.provider=cloudflare \
--certificatesResolvers.default.acme.dnsChallenge.resolvers=1.1.1.1:53,1.0.0.1:53 \
--providers.docker=true \
--providers.docker.endpoint=tcp://socket-proxy:2375 \
--providers.docker.exposedByDefault=false \
--providers.docker.network=traefik_proxy \
--providers.docker.swarmMode=false
We disable automatic exposure of new Docker services for safety. I configured
traefik-forward-auth
as the oauth middleware:
traefik-forward-auth:
<<: *common-keys-core
container_name: traefik-forward-auth
image: thomseddon/traefik-forward-auth:latest
command: --whitelist=/* redacted */
environment:
- CONFIG=/config
- COOKIE_DOMAIN=$FQDN
- INSECURE_COOKIE=false
- AUTH_HOST=oauth.$FQDN
- URL_PATH=/_oauth
- LOG_LEVEL=warn
- LOG_FORMAT=text
- LIFETIME=86400
- SECRET=$OAUTH_SECRET
- CLIENT_ID=$GOOGLE_CLIENT_ID
- CLIENT_SECRET=$GOOGLE_CLIENT_SECRET
labels:
- "traefik.enable=true"
## HTTP Routers
- "traefik.http.routers.oauth-rtr.tls=true"
- "traefik.http.routers.oauth-rtr.entrypoints=https"
- "traefik.http.routers.oauth-rtr.rule=Host(`oauth.$FQDN`)"
## Middlewares
- "traefik.http.routers.oauth-rtr.middlewares=traefik-forward-auth"
- "traefik.http.middlewares.traefik-forward-auth.forwardauth.address=http://traefik-forward-auth:4181"
- "traefik.http.middlewares.traefik-forward-auth.forwardauth.authResponseHeaders=X-Forwarded-User"
- "traefik.http.middlewares.traefik-forward-auth.forwardauth.trustForwardHeader=true"
## HTTP Services
- "traefik.http.routers.oauth-rtr.service=oauth-svc"
- "traefik.http.services.oauth-svc.loadbalancer.server.port=4181"
As long as the traefik-forward-auth
middleware is included, all requests will
need a valid cookie, which you can get by using Google’s OAuth support.
Deploying normal, no-oauth-required apps is easy: just specify the container
image, and include some traefik
configuration to expose the route externally
and connect it to the port internally.
healthcheck:
<<: *common-keys-apps
image: ghcr.io/rbtying/minimal-http-responder:v0.1.2
container_name: healthcheck
environment:
TEXT: potato
labels:
- "traefik.enable=true"
- "traefik.http.routers.healthcheck-rtr.tls.certResolver=default"
- "traefik.http.routers.healthcheck-rtr.entrypoints=https"
- "traefik.http.routers.healthcheck-rtr.rule=Host(`healthcheck.$FQDN`)"
- "traefik.http.routers.healthcheck-rtr.service=healthcheck-svc"
- "traefik.http.services.healthcheck-svc.loadbalancer.server.port=2020"
I did run into an issue when deploying vaultwarden
: the Docker container for
vaultwarden
specifies a healthcheck, and traefik
doesn’t instantiate the
route for containers which haven’t passed the healthcheck yet. This is pretty
reasonable, but the healthcheck interval for vaultwarden
is set to once per
minute – which means that it doesn’t show up for a minute. Changing this to
10s makes things come up near-immediately.
bitwarden:
<<: *common-keys-apps
image: vaultwarden/server:latest
container_name: bitwarden
volumes:
- $DOCKERDIR/appdata/bitwarden/:/data
- $DOCKERDIR/logs/bitwarden:/logs
environment:
- WEBSOCKET_ENABLED=true
- SIGNUPS_ALLOWED=false
- LOG_FILE=/logs/vaultwarden.log
healthcheck:
interval: 10s
labels:
- "traefik.enable=true"
## HTTP Routers
- "traefik.http.routers.bitwarden-rtr.entrypoints=https"
- "traefik.http.routers.bitwarden-rtr.tls.certResolver=default"
- "traefik.http.routers.bitwarden-rtr.rule=Host(`bitwarden.$FQDN`) || Host(`bitwarden.aeturnalus.com`)"
- "traefik.http.routers.bitwarden-ws-rtr.entrypoints=https"
- "traefik.http.routers.bitwarden-ws-rtr.tls.certResolver=default"
- "traefik.http.routers.bitwarden-ws-rtr.rule=(Host(`bitwarden.$FQDN`) || Host(`bitwarden.aeturnalus.com`)) && Path(`/notifications/hub`)"
## HTTP Services
- "traefik.http.routers.bitwarden-rtr.service=bitwarden-svc"
- "traefik.http.services.bitwarden-svc.loadbalancer.server.port=80"
- "traefik.http.routers.bitwarden-ws-rtr.service=bitwarden-ws-svc"
- "traefik.http.services.bitwarden-ws-svc.loadbalancer.server.port=3012"
Home Assistant has a slightly different flavor of issue. In order for local connected device discovery to work, the Home Assistant container needs to be on the host network. But, if it’s on the host network, it’s not on the Docker bridge networks, so the default Docker service discovery doesn’t quite work.
What we can do instead is to expose the Home Assistant port on the host, and
then configure traefik
to use the appropriate port. traefik
internally
connects to host network services at host.docker.internal
, so I also had to
add that as an extra_host
in the traefik
container (mapped to
host-gateway
on the traefik-proxy
network).
homeassistant:
container_name: homeassistant
image: "ghcr.io/home-assistant/home-assistant:stable"
volumes:
- $DOCKERDIR/appdata/homeassistant/:/config
- $DOCKERDIR/appdata/homeassistant/docker/run:/etc/services.d/home-assistant/run
- /etc/localtime:/etc/localtime:ro
restart: unless-stopped
network_mode: host
environment:
<<: *default-tz-puid-pgid
PACKAGES: iputils
labels:
- "traefik.enable=true"
## HTTP Routers
- "traefik.http.routers.home-assistant-rtr.tls.certResolver=default"
- "traefik.http.routers.home-assistant-rtr.entrypoints=https"
- "traefik.http.routers.home-assistant-rtr.rule=Host(`home-assistant.$FQDN`)"
## HTTP Services
- "traefik.http.routers.home-assistant-rtr.service=home-assistant-svc"
- "traefik.http.services.home-assistant-svc.loadbalancer.server.port=8124"
In order to test all of this, I first configured traefik
to run on differenrt
ports (i.e. not 80 and 443) so it wouldn’t conflict with the running Kubernetes
ingress. Then, I shut down Kubernetes and re-deployed the docker compose
with
traefik
running on the actual HTTP/HTTPS ports, and things Just Worked.
Pretty cool how you can set up a bunch of services in a couple of hours – containers really do drastically simplify running things in a home lab.