img configuring a lightweight nas for proxmox failover proxmox cluster

Configuring a lightweight NAS for Proxmox failover

Configuring a lightweight NAS for Proxmox failover

I will answer the practical question up front. A two-node Proxmox Cluster can provide useful failover for modest VM workloads if we add a qdevice and sort storage and backups sensibly. It is not the same as three full nodes. It buys quorum and reduces split-brain risk, but it does not remove other single points of failure. I will show what to run, what to host on the NAS, and how to test failover.

Initial Considerations for Proxmox Cluster Setup

Assessing Your Hardware Needs

Start by matching failure modes to cost. Two Proxmox nodes give CPU and memory redundancy for VMs, but not quorum. Add a qdevice to supply a third vote. For storage, pick a model that fits the workload:

  • If VMs must migrate live between nodes and share storage, use a shared filesystem or network storage that both nodes can access. NFS or iSCSI on a NAS will work for small setups.
  • If you prefer local disk performance and want resilience, use ZFS on each node with replication. That gives fast local I/O and asynchronous protection.
  • Avoid trying Ceph for two nodes. Ceph needs three or more full nodes to behave predictably in production.

Give each VM an expected failure-recovery time. If a VM can tolerate minutes of downtime while replication completes, replication is fine. If it needs seconds and live migration, use shared storage and pay for faster networking.

Understanding Node Requirements

Quorum matters. Corosync needs a majority of votes to operate cluster services like HA. A two-node cluster has an even number of votes, so a single node outage causes loss of majority unless a third vote exists. A qdevice offers that third vote without being a full Proxmox node.

Do not expect the qdevice to run VMs or host primary data. It only answers quorum queries. Keep it isolated from heavy load. The qdevice should be able to reach both Proxmox nodes on the Corosync network, with stable latency and reliable routing.

Network layout you should use:

  • Separate management network for Proxmox GUI and API.
  • Dedicated Corosync network for cluster traffic, preferably bonded and on a switch that supports jumbo frames if using high throughput.
  • Separate storage network for NFS/iSCSI or replication traffic.

Evaluating Lightweight Solutions

Options for the qdevice host:

  • A small NAS that can run a VM or container. Use a simple Debian VM to run corosync-qnetd.
  • A single-board computer like a Raspberry Pi running a minimal Linux image with corosync-qnetd.
  • A lightweight VM on an existing NAS appliance, if the NAS is reliable and on separate power.

The qdevice must run corosync-qnetd or corosync-qdevice components and be reachable over the network. The official Proxmox cluster manager docs describe the qdevice mechanism and how it integrates with pvecm; follow those steps when adding the device Proxmox Cluster Manager. Community writeups show practical setups for two-node clusters that use an external qdevice for quorum, which is helpful if you want a tested walkthrough Small Proxmox Cluster Tips and Tricks, and QDevices.

Implementing High Availability with qdevice

Setting Up qdevice for Quorum

I prefer this sequence when adding a qdevice:

  1. Build the qdevice host. Install a minimal Debian or Ubuntu server. Keep it on the Corosync network.
  2. Install the qdevice software on the host. On Debian-based systems that is corosync-qnetd/corosync-qdevice. On the Proxmox nodes install corosync-qdevice as well.
  3. On a Proxmox node run the pvecm command to add the qdevice. That registers the external device and exchanges certificates.
  4. Confirm the qdevice shows as online with pvecm status and check corosync logs.

Practical tips:

  • Put the qdevice on different power and physical network paths to each Proxmox node when possible.
  • If the qdevice is a VM on a NAS, make sure that VM can boot independently of the Proxmox nodes. If the NAS hosts only the qdevice VM on the same hardware as a Proxmox node, the qdevice loses value when that hardware fails.
  • Keep the qdevice’s system clock correct. Corosync is sensitive to clock drift.

When you add the qdevice it gives a vote. That restores majority when one Proxmox node fails. It does not make your storage redundant by itself.

Configuring the NAS for Proxmox

If the NAS will host storage for VM disks and the qdevice VM, split roles carefully.

  • Storage role: Use NFS or iSCSI exported from the NAS for shared VM disks. Use ZFS on the NAS if you need snapshots and replication features.
  • qdevice role: Run the qnetd daemon on a small VM or container on the NAS. Keep the qdevice VM minimal. No user workloads.
  • Network: Put storage traffic on a dedicated VLAN or physical interface. Keep the qdevice on the Corosync VLAN or interface.

If you choose replication instead of shared storage:

  • Configure ZFS send/receive or use Proxmox replication jobs.
  • Accept that active failover may require starting the replicated VM on the surviving node and promoting storage copies. That takes time.

Testing Failover Scenarios

Test deliberately, in a lab window. I follow these checks:

  1. Baseline: run pvecm status and confirm quorum with both nodes up and the qdevice online.
  2. Node failure: power off one Proxmox node. Run pvecm status on the remaining node. The cluster should retain quorum if the qdevice is online. Check VM HA behaviour by creating a simple HA group and forcing failover.
  3. qdevice failure: stop the qnetd service on the qdevice host. With both Proxmox nodes up, confirm quorum still holds (two votes). Power off one Proxmox node and confirm the cluster loses quorum when the qdevice is down.
  4. Network partition: simulate a split by isolating the Corosync network from one Proxmox node. Observe whether the cluster fences or stops HA services as expected.

Verification commands to use:

  • pvecm status
  • journalctl -u corosync -f
  • systemctl status corosync
  • Check VM HA status in the Proxmox GUI or via the pvesh API

Record timings for failover and replication convergence. If failover time is longer than acceptable, rework storage or replication frequency.

Concrete trade-offs and final takeaways

  • Two nodes plus a qdevice gives a quorum vote and reduces split-brain risk. It is a practical, lower-cost option for small VM workloads. It is not a full substitute for a third full node.
  • Do not rely on the qdevice for storage redundancy. Treat it as a quorum witness only.
  • Use ZFS replication or a reliable NAS for VM data, and keep backups with a 3-2-1 approach.
  • Test outages and measure failover times before moving into production.

I have used this setup for low-tolerance production workloads that can accept short failover windows. If you need sub-second failover or multi-site resilience, plan for three full nodes or a different distributed storage solution.

Leave a Reply

Your email address will not be published. Required fields are marked *

Prev
Beats Studio Pro + 2 more Amazon tech bargains
weekly deals

Beats Studio Pro + 2 more Amazon tech bargains

Discover the Beats Studio Pro and two more Amazon tech bargains this week

Next
Loki | v3.6.0
loki v3 6 0

Loki | v3.6.0

Explore the key features, fixes, Helm chart updates, query engine enhancements,

You May Also Like