Fencing in linux VM cluster

Setup fencing between linux VM cluster nodes running on KVM hypervisors.
By Kostas Koutsogiannopoulos

Introduction

On clustered computer environments there is always the possibility of a failing/malfunctioning node. In cases that this node have control over shared resources, the cluster needs a process for isolating nodes from those resources. For example on a basic Active/Passive topology with a single disk attached on both nodes and a non cluster aware filesystem (like xfs, ext4 etc.) mounted on the active node, a possible failure on the active node can cause data corruption. So after realizing the failure and before starting resources on the passive node, we need to be sure that the failing node is isolated.

An approach for fencing process is "stonith" that stands for "Shoot The Other Node In The Head". Practically, the failing node is totally disabled after failure, before the cluster try to restore the service on another available node. There are multiple ways to implement stonith, for example, on physical machines, we can use a power controller (like a power switch) to power off a node. On a virtual environment we can order the hypervisor to reset a virtual machine as we do on the following example.

Environment

Our setup is the active/passive topology described here:

https://www.epilis.gr/en/blog/2018/06/04/highly-available-nfs-server/

The two virtual machines are living on two different CentOS hypervisors: KVM-server1, KVM-server2

[root@KVM-server1 ~]# cat /etc/redhat-release
CentOS Linux release 7.5.1804 (Core)

[root@KVM-server2 ~]# cat /etc/redhat-release
CentOS Linux release 7.5.1804 (Core)

Install packages

The following packages need to be installed on both of our hypervisors and both of our virtual machines that need to be able to trigger fencing:

$ sudo yum update

$ sudo yum install fence-virt fence-virtd fence-virtd-libvirt fence-virtd-multicast fence-virtd-serial

KVM authentication key

The following key is used on virtual machines for authentication on hypervisors. So we need to follow the procedure to create a random key on one server (on path /etc/cluster) and then copy the same key to the other three systems:

# mkdir /etc/cluster

# dd if=/dev/urandom of=/etc/cluster/fence_xvm.key bs=4k count=1
1+0 records in 1+0 records out 4096 bytes (4.1 kB) copied, 0.000140577 s, 29.1 MB/s

fence_virtd configuration

Using the command below we can configure fence-virtd daemon interactively. We have to follow the same procedure on virtual machines and on hypervisors:

# fence_virtd -c
Module search path [/usr/lib64/fence-virt]:

Available backends: libvirt 0.3 Available listeners: serial 0.4 multicast 1.2

Listener modules are responsible for accepting requests from fencing clients.

Listener module [multicast]:

The multicast listener module is designed for use environments where the guests and hosts may communicate over a network using multicast.

The multicast address is the address that a client will use to send fencing requests to fence_virtd.

Multicast IP Address [225.0.0.12]:

Using ipv4 as family.

Multicast IP Port [1229]:

Setting a preferred interface causes fence_virtd to listen only on that interface. Normally, it listens on all interfaces. In environments where the virtual machines are using the host machine as a gateway, this *must* be set (typically to virbr0). Set to 'none' for no interface.

Interface [virbr0]:

The key file is the shared key information which is used to authenticate fencing requests. The contents of this file must be distributed to each physical host and virtual machine within a cluster.

Key File [/etc/cluster/fence_xvm.key]:

Backend modules are responsible for routing requests to the appropriate hypervisor or management layer.

Backend module [libvirt]:

Configuration complete.

=== Begin Configuration === backends { libvirt { uri = "qemu:///system"; }

}

listeners { multicast { port = "1229"; family = "ipv4"; interface = "virbr0"; address = "225.0.0.12"; key_file = "/etc/cluster/fence_xvm.key"; }

}

fence_virtd { module_path = "/usr/lib64/fence-virt"; backend = "libvirt"; listener = "multicast"; }

=== End Configuration === Replace /etc/fence_virt.conf with the above [y/N]? y

After configuration we can start and enable fence_virtd daemon on the hypervisors:

# systemctl start fence_virtd

...and run the command below. The output is a list with a virtual machine name, virtual machine id and status (on/off).

For example:

# fence_xvm -o list
afmdb b814a854-5ae3-4032-a539-61368ace91a3 off nfs-server1 41a4d124-6ab2-416c-af66-435e0e1419f4 on nfs-server2 b33f94f9-8a30-4e83-9165-fd8681ee13cf on python-dev 8f6eeabf-a531-40d3-99cd-e7a341dcb489 off RHEL74 e9416c75-0ab0-4c9d-91de-2385df789d02 off ucdbp 601e9ebb-e77b-4f1c-8b15-98451f376ea4 off Windoze10 62edfed4-ef3f-40ab-91a8-4054d079bae7 on

We need the exact same command to play on our virtual machines. To do that we need some firewall management on the hypervisors:

[root@KVM-server1 ~]# firewall-cmd --permanent --direct --add-rule ipv4 filter INPUT 0 -m pkttype --pkt-type multicast -j ACCEPT
success
[root@KVM-server1 ~]# firewall-cmd --reload
success

[root@KVM-server2 ~]# firewall-cmd --permanent --direct --add-rule ipv4 filter INPUT 0 -m pkttype --pkt-type multicast -j ACCEPT
success
[root@KVM-server2 ~]# firewall-cmd --reload
success

... and on virtual machines:

[root@nfs-server1 ~]# firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="192.168.16.101" port port="1229" protocol="tcp" accept'
success
[root@nfs-server1 ~]# firewall-cmd --reload
success

[root@nfs-server2 ~]# firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="192.168.16.102" port port="1229" protocol="tcp" accept'
success
[root@nfs-server2 ~]# firewall-cmd --reload
success

Then we make sure that:

# fence_xvm -o list

...is functional on both of our cluster members.

Cluster - adding stonith resources

If the KVM domain names of the virtual machines and the hostnames of our nodes are the same, we just execute:

# pcs stonith create FenceVM fence_xvm key_file=/etc/cluster/fence_xvm.key

In other cases we need to configure the pcmk_host_map parameter:

# pcs stonith create FenceVM fence_xvm pcmk_host_map="nfs-server1:<port/vm/list name> nfs-server2:<port/vm/list name>" key_file=/etc/cluster/fence_xvm.key

Activating stonith for our cluster:

# pcs property set stonith-enabled=true

The last thing we need to do is update every resource to trigger fencing on fail situation:

For example with reource "SharedFS" we need to run;

# pcs resource update SharedFS op monitor on-fail=fence

... and we need to do that for every other resource (NFSService, NFSExport, VIP, NFSNotify).

After that, on status we have:

# pcs status
Cluster name: nfs-server-cluster Stack: corosync Current DC: nfs-server2 (version 1.1.18-11.el7_5.2-2b07d5c5a9) - partition with quorum Last updated: Tue Jun 5 15:24:48 2018 Last change: Tue Jun 5 15:24:36 2018 by root via cibadmin on nfs-server1

2 nodes configured 6 resources configured

Online: [ nfs-server1 nfs-server2 ]

Full list of resources:

Resource Group: nfsresourcegroup SharedFS (ocf::heartbeat:Filesystem): Started nfs-server1 NFSService (ocf::heartbeat:nfsserver): Started nfs-server1 NFSExport (ocf::heartbeat:exportfs): Started nfs-server1 VIP (ocf::heartbeat:IPaddr2): Started nfs-server1 NFSNotify (ocf::heartbeat:nfsnotify): Started nfs-server1 FenceVM (stonith:fence_xvm): Started nfs-server2

Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled

Note that "FenceVM" resource is running on the inactive node (if available).

Some testing

With both of our nodes up, we login on the active one trying to intentionally cause en error like:

Stopping network interface
Stopping nfs-server.service
Umount /nfsdata

In any case the hypervisor is killing the active node (reset then boot again) and the resources are moving to the other one. After some seconds both nodes are appearing "Online" again.