From Docker Swarm to Proxmox HA: A Homelab Migration Journey - Victor Da Luz

After running a Docker Swarm cluster for a long time, I decided it was time to try something different. My setup had a Raspberry Pi 4 as the manager and two M1 Mac Minis as workers. The Docker Swarm setup was fine, but it was limiting because many commonly used Docker images still lack native Arm64 support, forcing slow emulation or outright failure. Since the entire cluster was ARM-based (Pi 4 and M1 Mac Minis), the lack of x86-64 image support was the constraint. Most of the time everything worked well, but sometimes services wouldn’t run on ARM or would have performance issues when emulation was required.

I’ve wanted to try Proxmox for a long time. The idea of running VMs and containers on dedicated hardware appealed to me, and I kept thinking about it but never had the right hardware. Then I found two Lenovo M710q mini PCs for cheap, and that was enough to push me to make the change. It wasn’t that Docker Swarm was broken or that I was unhappy with it. I just wanted to try Proxmox, and having the hardware made it the right time.

The migration to a Proxmox cluster represents more than just a platform change. It’s an opportunity to learn a different approach to infrastructure, address some limitations, and build something with better high availability. This is the story of how that migration happened and what I learned about the differences between Docker Swarm and Proxmox.

The biggest issue: the NAS as a single point of failure

The main problem with my Docker Swarm setup wasn’t Docker Swarm itself. It was that all Docker volumes were stored on the NAS via NFS mounts. Everything lived at /docker/volumes/, which meant the NAS became a single point of failure. If the NAS had problems, everything had problems. If the NAS went down, services couldn’t access their data.

This was probably the biggest issue I was hoping to solve with Proxmox. With Proxmox, I can store VMs and containers on local storage, with the option to replicate or backup to the NAS. The NAS becomes a backup target rather than a critical dependency. I’ll write more about this in another post, but addressing the single point of failure was a key motivation.

The ARM compatibility issues were also limiting. Since my entire cluster was ARM-based, I was constrained by which Docker images supported Arm64. When images lacked native Arm64 support, services either failed or ran slowly through emulation. Having x86 machines would eliminate that constraint.

Beyond those issues, Docker Swarm worked well. The services ran reliably, the setup was straightforward, and managing containers was easy. I wasn’t migrating because I was unhappy with Docker Swarm. I was migrating because I wanted to try Proxmox and address those limitations.

What I was running on Docker Swarm

The Docker Swarm cluster had several services running. Traefik handled reverse proxy duties with SSL termination. Uptime Kuma provided monitoring and alerting. Apprise handled notifications. I had a full Prometheus stack with Alertmanager and Grafana for monitoring. The UniFi Controller managed my access points. PostgreSQL and Adminer provided database services. Nextcloud was installed, though I never really used it. And Homepage gave me a service dashboard.

The truth is, most of these services weren’t critical. My truly critical services are Pi-hole for DNS and Home Assistant, and those run separately. I can live without most of what was in the Docker Swarm cluster with no more than mild inconvenience. This made the migration less risky. If something didn’t work immediately, it wasn’t a disaster.

The new Proxmox HA cluster would be a complete change. Two Lenovo M710q machines, proxmox1 and proxmox2, would serve as primary and secondary nodes. The pi would be repurposed as a qdevice, which acts as a tie-breaker witness for cluster quorum. The qdevice is a lightweight service that runs on the pi and adds a vote to the cluster’s tally, providing an odd number of votes for quorum decisions. This ensures the cluster can maintain quorum even if one of the main Proxmox nodes is down, which is essential for a two-node HA setup. The qdevice doesn’t have the operational capacity of a full Proxmox node; it just participates in voting.

This setup provides true high availability with automatic failover. If one Proxmox node fails, services can automatically move to the other node. The qdevice ensures the cluster stays operational even during node failures.

The hardware setup and a broken cable

Setting up the Lenovo M710q machines was mostly straightforward. Installing Proxmox VE on them was simple, and they booted up without issues. Both machines have an NVMe drive and a SATA connection. My plan was to use the NVMe drives as system drives for Proxmox and set up the SATA drives as ZFS pools for containers and VMs.

Then I noticed that one of the machines had problems accessing its SATA connection. It turned out one of the Lenovos had a broken SATA ribbon cable. The SATA connector wasn’t working, which meant I couldn’t use the SATA drive for the ZFS pool on that machine.

I could still set up the Proxmox cluster anyway. Since both machines had working NVMe drives, I could install Proxmox on them and get the cluster running. The broken cable just meant I couldn’t set up the ZFS pool on that node yet, but the cluster itself would work fine.

Once I got the SATA ribbon cable replaced, I could add the ZFS pool to that node. Proxmox’s architecture made this easy. I repaired the hardware, and then I could configure the SATA drive as a ZFS pool and add it to the cluster storage. The cluster didn’t need both nodes to have ZFS pools to function, which made the repair process straightforward.

This flexibility is one of the advantages of Proxmox. The cluster can work with different storage configurations, and storage can be added to nodes as needed without breaking the cluster.

Shutting down the Docker Swarm cluster

I didn’t do a careful shutdown procedure. Since most services weren’t critical, I just stopped the services and cleaned up the Docker Swarm setup. I made backups of the volumes I cared about, but I didn’t spend a lot of time on graceful shutdowns or dependency analysis.

The shutdown was straightforward. I stopped services, drained the worker nodes, and removed them from the swarm. Then I backed up the volumes that mattered and cleaned up the storage. Since I could live without most of these services, there wasn’t much risk.

The Mac Minis were retired, and the pi was freed up for other uses. The Docker Swarm cluster was no longer needed, so shutting it down was simple. My critical services were already running elsewhere, so this was just cleanup. i kept one of the Minis for future use and gave the other one to my sister so she can learn MacOS and get away from Windows 11 as it falls apart. They are both still perfectly good M1 Mac Minis so they still have years of use ahead of them.

This approach worked because I wasn’t dependent on the Docker Swarm services. If something had gone wrong during shutdown, it wouldn’t have been a disaster. Having critical services running separately gave me the freedom to experiment with the migration.

The differences between Docker Swarm and Proxmox

Docker Swarm and Proxmox approach infrastructure very differently, and understanding these differences helps explain why the migration makes sense for different use cases.

Docker Swarm is container-focused. Everything runs as containers, and Docker handles the orchestration. You define services, and Docker Swarm figures out where to run them and how to manage them. It’s designed for running containerized applications with minimal overhead.

Proxmox is VM and container-focused. You can run full VMs with complete operating systems, or lightweight LXC containers. Proxmox manages the hypervisor layer, giving you more control over the underlying infrastructure. It’s designed for virtualization and resource management.

The difference between LXC containers and VMs matters. VMs are full virtual machines with their own kernel and complete operating system. They’re isolated by the hypervisor layer, providing significantly stronger security and resource isolation than containers but using more resources. An LXC container shares the host’s kernel but runs its own filesystem and processes. It’s more lightweight than a VM but less isolated. Think of VMs as completely separate computers, while LXC containers are like isolated processes with their own filesystem. For most homelab services, LXC containers provide good isolation with less overhead. You might choose a VM when you need a different operating system or stronger isolation.

Docker Swarm is easier to get started with. You install Docker, create a swarm, and deploy services. The abstraction layer handles a lot of the complexity. If you just want to run containers and don’t care about the underlying infrastructure, Docker Swarm is simpler.

Proxmox requires more setup but provides more control. You need to understand VMs, storage, networking, and cluster configuration. The learning curve is steeper, but you get more visibility and control over what’s happening.

Resource management works differently. Docker Swarm manages resources at the container level, sharing the host’s resources among containers. Proxmox manages resources at the VM or LXC level, giving you guaranteed resource allocation and better isolation.

High availability means different things. Docker Swarm can restart containers on different nodes if a node fails, but it doesn’t provide true high availability at the infrastructure level. Proxmox HA can move entire VMs between nodes automatically, providing infrastructure-level redundancy.

The biggest difference for me is the learning opportunity. Docker makes it easy to deploy services without understanding much about how they work. With LXC containers in Proxmox, setting up each service requires more understanding of the service itself. You need to configure the container, install the software, set up configuration files, and automate the setup with scripts.

This learning opportunity is valuable. Setting up services in LXC containers means I learn more about each service. I understand how it’s configured, what it depends on, and how to automate its setup. With Docker, you can deploy something without understanding it deeply. With LXC, you need to understand it to set it up.

The automation scripting becomes more important. With Docker Swarm, you define a service and deploy it. With Proxmox LXC containers, you need scripts to automate installation and configuration. This forces you to think about how services are set up and how to make that process repeatable.

For homelab learning, Proxmox provides more value. You learn more about each service, you understand the infrastructure better, and you build automation that makes you better at managing systems. Docker Swarm is easier, but Proxmox teaches you more.

What the migration provides

The move from Docker Swarm to Proxmox offers several benefits that make sense for my homelab.

The NAS is no longer a single point of failure. VMs and containers can run on local storage, with backups to the NAS. If the NAS has problems, services can continue running. This addresses the biggest issue I had with the Docker Swarm setup. Admittedly this is very likely that I set up the Swarm incorrectly though.

Better resource isolation. VMs and LXC containers have guaranteed resource allocation, so services don’t compete unexpectedly. Each service gets what it needs, and resource contention is predictable.

True high availability. The Proxmox HA cluster can automatically move VMs between nodes if a node fails. Combined with the pi qdevice for quorum, the cluster stays operational even during failures.

More learning opportunities. Setting up services in LXC containers requires understanding how they work. You need to configure them, install software, and automate setup. This teaches you more about each service than Docker does.

Better infrastructure visibility. Proxmox provides detailed monitoring and management at the VM and container level. You can see exactly what resources each service is using and manage them individually.

ARM compatibility issues are gone. The x86 Lenovo machines can run anything, without worrying about ARM support or performance issues.

The migration continues

The Proxmox HA cluster is set up and running. The two Lenovo nodes are operational, the Pi is serving as the qdevice, and the cluster has quorum. Even with one node’s broken SATA cable initially, the cluster worked.

Service migration will happen gradually as needed. I’m not in a rush to move everything over. When I need a service, I’ll set it up in a Proxmox LXC container or VM. This gradual approach lets me learn each service as I set it up.

The biggest win is addressing the NAS single point of failure. I’ll write more about the storage strategy in another post, but having services run on local storage with NAS as backup changes the architecture fundamentally. If the NAS has problems, services keep running.

Docker Swarm was fine, and Proxmox is different. Neither is objectively better in my opinion. Docker Swarm is easier for running containers quickly. Proxmox provides more control, better resource management, and more learning opportunities. For a homelab where learning is part of the goal, Proxmox makes sense.

The migration has been worth it so far. The cluster is running, I’m learning more about Proxmox and each service I set up, and I’ve addressed the single point of failure that was the biggest limitation of the Docker Swarm setup. The gradual migration approach means I can experiment and learn without pressure.