Fencing in linux VM cluster
Setup fencing between linux VM cluster nodes running on KVM hypervisors.
On clustered computer environments there is always the possibility of a failing/malfunctioning node. In cases that this node have control over shared resources, the cluster needs a process for isolating nodes from those resources. For example on a basic Active/Passive topology with a single disk attached on both nodes and a non cluster aware filesystem (like xfs, ext4 etc.) mounted on the active node, a possible failure on the active node can cause data corruption. So after realizing the failure and before starting resources on the passive node, we need to be sure that the failing node is isolated.
An approach for fencing process is "stonith" that stands for "Shoot The Other Node In The Head". Practically, the failing node is totally disabled after failure, before the cluster try to restore the service on another available node. There are multiple ways to implement stonith, for example, on physical machines, we can use a power controller (like a power switch) to power off a node. On a virtual environment we can order the hypervisor to reset a virtual machine as we do on the following example.
Our setup is the active/passive topology described here:
The two virtual machines are living on two different CentOS hypervisors: KVM-server1, KVM-server2
[root@KVM-server1 ~]# cat /etc/redhat-release
CentOS Linux release 7.5.1804 (Core)
[root@KVM-server2 ~]# cat /etc/redhat-release
CentOS Linux release 7.5.1804 (Core)
The following packages need to be installed on both of our hypervisors and both of our virtual machines that need to be able to trigger fencing:
$ sudo yum update
$ sudo yum install fence-virt fence-virtd fence-virtd-libvirt fence-virtd-multicast fence-virtd-serial
KVM authentication key
The following key is used on virtual machines for authentication on hypervisors. So we need to follow the procedure to create a random key on one server (on path /etc/cluster) and then copy the same key to the other three systems:
# mkdir /etc/cluster
# dd if=/dev/urandom of=/etc/cluster/fence_xvm.key bs=4k count=1
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.000140577 s, 29.1 MB/s
Using the command below we can configure fence-virtd daemon interactively. We have to follow the same procedure on virtual machines and on hypervisors:
# fence_virtd -c
Module search path [/usr/lib64/fence-virt]:
Listener modules are responsible for accepting requests
from fencing clients.
Listener module [multicast]:
The multicast listener module is designed for use environments
where the guests and hosts may communicate over a network using
The multicast address is the address that a client will use to
send fencing requests to fence_virtd.
Multicast IP Address [220.127.116.11]:
Using ipv4 as family.
Multicast IP Port :
Setting a preferred interface causes fence_virtd to listen only
on that interface. Normally, it listens on all interfaces.
In environments where the virtual machines are using the host
machine as a gateway, this *must* be set (typically to virbr0).
Set to 'none' for no interface.
The key file is the shared key information which is used to
authenticate fencing requests. The contents of this file must
be distributed to each physical host and virtual machine within
Key File [/etc/cluster/fence_xvm.key]:
Backend modules are responsible for routing requests to
the appropriate hypervisor or management layer.
Backend module [libvirt]:
=== Begin Configuration ===
uri = "qemu:///system";
port = "1229";
family = "ipv4";
interface = "virbr0";
address = "18.104.22.168";
key_file = "/etc/cluster/fence_xvm.key";
module_path = "/usr/lib64/fence-virt";
backend = "libvirt";
listener = "multicast";
=== End Configuration ===
Replace /etc/fence_virt.conf with the above [y/N]? y
After configuration we can start and enable fence_virtd daemon on the hypervisors:
# systemctl start fence_virtd
.. and run the command below. The output is a list with a virtual machine name, virtual machine id and status (on/off).
# fence_xvm -o list
afmdb b814a854-5ae3-4032-a539-61368ace91a3 off
nfs-server1 41a4d124-6ab2-416c-af66-435e0e1419f4 on
nfs-server2 b33f94f9-8a30-4e83-9165-fd8681ee13cf on
python-dev 8f6eeabf-a531-40d3-99cd-e7a341dcb489 off
RHEL74 e9416c75-0ab0-4c9d-91de-2385df789d02 off
ucdbp 601e9ebb-e77b-4f1c-8b15-98451f376ea4 off
Windoze10 62edfed4-ef3f-40ab-91a8-4054d079bae7 on
We need the exact same command to play on our virtual machines. To do that we need some firewall management on the hypervisors:
[root@KVM-server1 ~]# firewall-cmd --permanent --direct --add-rule ipv4 filter INPUT 0 -m pkttype --pkt-type multicast -j ACCEPT
[root@KVM-server1 ~]# firewall-cmd --reload
[root@KVM-server2 ~]# firewall-cmd --permanent --direct --add-rule ipv4 filter INPUT 0 -m pkttype --pkt-type multicast -j ACCEPT
[root@KVM-server2 ~]# firewall-cmd --reload
... and on virtual machines:
[root@nfs-server1 ~]# firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="192.168.16.101" port port="1229" protocol="tcp" accept'
[root@nfs-server1 ~]# firewall-cmd --reload
[root@nfs-server2 ~]# firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="192.168.16.102" port port="1229" protocol="tcp" accept'
[root@nfs-server2 ~]# firewall-cmd --reload
Then we make sure that:
# fence_xvm -o list
...is functional on both of our cluster members.
Cluster - adding stonith resources
If the KVM domain names of the virtual machines and the hostnames of our nodes are the same, we just execute:
# pcs stonith create FenceVM fence_xvm key_file=/etc/cluster/fence_xvm.key
In other cases we need to configure the pcmk_host_map parameter:
# pcs stonith create FenceVM fence_xvm pcmk_host_map="nfs-server1:<port/vm/list name> nfs-server2:<port/vm/list name>" key_file=/etc/cluster/fence_xvm.key
Activating stonith for our cluster:
# pcs property set stonith-enabled=true
The last thing we need to do is update every resource to trigger fencing on fail situation:
For example with reource "SharedFS" we need to run;
# pcs resource update SharedFS op monitor on-fail=fence
... and we need to do that for every other resource (NFSService, NFSExport, VIP, NFSNotify).
After that, on status we have:
# pcs status
Cluster name: nfs-server-cluster
Current DC: nfs-server2 (version 1.1.18-11.el7_5.2-2b07d5c5a9) - partition with quorum
Last updated: Tue Jun 5 15:24:48 2018
Last change: Tue Jun 5 15:24:36 2018 by root via cibadmin on nfs-server1
2 nodes configured
6 resources configured
Online: [ nfs-server1 nfs-server2 ]
Full list of resources:
Resource Group: nfsresourcegroup
SharedFS (ocf::heartbeat:Filesystem): Started nfs-server1
NFSService (ocf::heartbeat:nfsserver): Started nfs-server1
NFSExport (ocf::heartbeat:exportfs): Started nfs-server1
VIP (ocf::heartbeat:IPaddr2): Started nfs-server1
NFSNotify (ocf::heartbeat:nfsnotify): Started nfs-server1
FenceVM (stonith:fence_xvm): Started nfs-server2
Note that "FenceVM" resource is running on the inactive node (if available).
With both of our nodes up, we login on the active one trying to intentionally cause en error like:
- Stopping network interface
- Stopping nfs-server.service
- Umount /nfsdata
In any case the hypervisor is killing the active node (reset then boot again) and the resources are moving to the other one. After some seconds both nodes are appearing "Online" again.
- Posted by Kostas Koutsogiannopoulos · July 2, 2018