None

Fencing in linux VM cluster


Setup fencing between linux VM cluster nodes running on KVM hypervisors.

By Kostas Koutsogiannopoulos

Introduction

On clustered computer environments there is always the possibility of a failing/malfunctioning node. In cases that this node have control over shared resources, the cluster needs a process for isolating nodes from those resources. For example on a basic Active/Passive topology with a single disk attached on both nodes and a non cluster aware filesystem (like xfs, ext4 etc.) mounted on the active node, a possible failure on the active node can cause data corruption. So after realizing the failure and before starting resources on the passive node, we need to be sure that the failing node is isolated.

An approach for fencing process is "stonith" that stands for "Shoot The Other Node In The Head". Practically, the failing node is totally disabled after failure, before the cluster try to restore the service on another available node. There are multiple ways to implement stonith, for example, on physical machines, we can use a power controller (like a power switch) to power off a node. On a virtual environment we can order the hypervisor to reset a virtual machine as we do on the following example.

Environment

Our setup is the active/passive topology described here:

https://www.epilis.gr/en/blog/2018/06/04/highly-available-nfs-server/

The two virtual machines are living on two different CentOS hypervisors: KVM-server1, KVM-server2

[root@KVM-server1 ~]# cat /etc/redhat-release
CentOS Linux release 7.5.1804 (Core)

[root@KVM-server2 ~]# cat /etc/redhat-release
CentOS Linux release 7.5.1804 (Core)

Install packages

The following packages need to be installed on both of our hypervisors and both of our virtual machines that need to be able to trigger fencing:

$ sudo yum update

$ sudo yum install fence-virt fence-virtd fence-virtd-libvirt fence-virtd-multicast fence-virtd-serial

KVM authentication key

The following key is used on virtual machines for authentication on hypervisors. So we need to follow the procedure to create a random key on one server (on path /etc/cluster) and then copy the same key to the other three systems:

# mkdir /etc/cluster

# dd if=/dev/urandom of=/etc/cluster/fence_xvm.key bs=4k count=1
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.000140577 s, 29.1 MB/s

fence_virtd configuration

Using the command below we can configure fence-virtd daemon interactively. We have to follow the same procedure on virtual machines and on hypervisors:

# fence_virtd -c
Module search path [/usr/lib64/fence-virt]:

Available backends:
    libvirt 0.3
Available listeners:
    serial 0.4
    multicast 1.2

Listener modules are responsible for accepting requests
from fencing clients.

Listener module [multicast]:

The multicast listener module is designed for use environments
where the guests and hosts may communicate over a network using
multicast.

The multicast address is the address that a client will use to
send fencing requests to fence_virtd.

Multicast IP Address [225.0.0.12]:

Using ipv4 as family.

Multicast IP Port [1229]:

Setting a preferred interface causes fence_virtd to listen only
on that interface.  Normally, it listens on all interfaces.
In environments where the virtual machines are using the host
machine as a gateway, this *must* be set (typically to virbr0).
Set to 'none' for no interface.

Interface [virbr0]:

The key file is the shared key information which is used to
authenticate fencing requests.  The contents of this file must
be distributed to each physical host and virtual machine within
a cluster.

Key File [/etc/cluster/fence_xvm.key]:

Backend modules are responsible for routing requests to
the appropriate hypervisor or management layer.

Backend module [libvirt]:

Configuration complete.

=== Begin Configuration ===
backends {
    libvirt {
        uri = "qemu:///system";
    }

}

listeners {
    multicast {
        port = "1229";
        family = "ipv4";
        interface = "virbr0";
        address = "225.0.0.12";
        key_file = "/etc/cluster/fence_xvm.key";
    }

}

fence_virtd {
    module_path = "/usr/lib64/fence-virt";
    backend = "libvirt";
    listener = "multicast";
}

=== End Configuration ===
Replace /etc/fence_virt.conf with the above [y/N]? y

After configuration we can start and enable fence_virtd daemon on the hypervisors:

# systemctl start fence_virtd

...and run the command below. The output is a list with a virtual machine name, virtual machine id and status (on/off).

For example:

# fence_xvm -o list
afmdb                b814a854-5ae3-4032-a539-61368ace91a3 off
nfs-server1          41a4d124-6ab2-416c-af66-435e0e1419f4 on
nfs-server2          b33f94f9-8a30-4e83-9165-fd8681ee13cf on
python-dev           8f6eeabf-a531-40d3-99cd-e7a341dcb489 off
RHEL74               e9416c75-0ab0-4c9d-91de-2385df789d02 off
ucdbp                601e9ebb-e77b-4f1c-8b15-98451f376ea4 off
Windoze10            62edfed4-ef3f-40ab-91a8-4054d079bae7 on

We need the exact same command to play on our virtual machines. To do that we need some firewall management on the hypervisors:

[root@KVM-server1 ~]# firewall-cmd --permanent --direct --add-rule ipv4 filter INPUT 0 -m pkttype --pkt-type multicast -j ACCEPT
success
[root@KVM-server1 ~]# firewall-cmd --reload
success

[root@KVM-server2 ~]# firewall-cmd --permanent --direct --add-rule ipv4 filter INPUT 0 -m pkttype --pkt-type multicast -j ACCEPT
success
[root@KVM-server2 ~]# firewall-cmd --reload
success

... and on virtual machines:

[root@nfs-server1 ~]# firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="192.168.16.101" port port="1229" protocol="tcp" accept'
success
[root@nfs-server1 ~]# firewall-cmd --reload
success

[root@nfs-server2 ~]# firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="192.168.16.102" port port="1229" protocol="tcp" accept'
success
[root@nfs-server2 ~]# firewall-cmd --reload
success

Then we make sure that:

# fence_xvm -o list

...is functional on both of our cluster members.

Cluster - adding stonith resources

If the KVM domain names of the virtual machines and the hostnames of our nodes are the same, we just execute:

# pcs stonith create FenceVM fence_xvm key_file=/etc/cluster/fence_xvm.key

In other cases we need to configure the pcmk_host_map parameter:

# pcs stonith create FenceVM fence_xvm pcmk_host_map="nfs-server1:<port/vm/list name> nfs-server2:<port/vm/list name>" key_file=/etc/cluster/fence_xvm.key

Activating stonith for our cluster:

# pcs property set stonith-enabled=true

The last thing we need to do is  update every resource to trigger fencing on fail situation:

For example with reource "SharedFS" we need to run;

# pcs resource update SharedFS op monitor on-fail=fence

... and we need to do that for every other resource (NFSService, NFSExport, VIP, NFSNotify).

After that, on status we have:

# pcs status
Cluster name: nfs-server-cluster
Stack: corosync
Current DC: nfs-server2 (version 1.1.18-11.el7_5.2-2b07d5c5a9) - partition with quorum
Last updated: Tue Jun  5 15:24:48 2018
Last change: Tue Jun  5 15:24:36 2018 by root via cibadmin on nfs-server1

2 nodes configured
6 resources configured

Online: [ nfs-server1 nfs-server2 ]

Full list of resources:

 Resource Group: nfsresourcegroup
     SharedFS    (ocf::heartbeat:Filesystem):    Started nfs-server1
     NFSService    (ocf::heartbeat:nfsserver):    Started nfs-server1
     NFSExport    (ocf::heartbeat:exportfs):    Started nfs-server1
     VIP    (ocf::heartbeat:IPaddr2):    Started nfs-server1
     NFSNotify    (ocf::heartbeat:nfsnotify):    Started nfs-server1
 FenceVM    (stonith:fence_xvm):    Started nfs-server2

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

Note that "FenceVM" resource is running on the inactive node (if available).

Some testing

With both of our nodes up, we login on the active one trying to intentionally cause en error like:

  • Stopping network interface
  • Stopping nfs-server.service
  • Umount /nfsdata

In any case the hypervisor is killing the active node (reset then boot again) and the resources are moving to the other one. After some seconds both nodes are appearing "Online" again.

 


View epilis's profile on LinkedIn Visit us on facebook X epilis rss feed: Latest articles