Skip to main content

index

Run sudo cat /proc/drbd to retrieve the DRBD service status

How unhealthy looks like:

version: 8.4.5 (api:1/proto:86-101)srcversion: 611D9EEFB9C11D2BC709D07 0: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r-----    ns:0 nr:1408 dw:1816 dr:57681 al:4 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:404

How synching looks like:

[email protected]:/# sudo cat /proc/drbdversion: 8.4.5 (api:1/proto:86-101)srcversion: 611D9EEFB9C11D2BC709D07 0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r-----    ns:28 nr:1408 dw:2040 dr:57717 al:6 bm:0 lo:0 pe:1 ua:0 ap:0 ep:1 wo:f oos:572        [===================>] sync'ed:100.0% (572/592)K        finish: 0:00:01 speed: 20 (20) K/sec

How healthy looks like:

[email protected]:/# sudo cat /proc/drbdversion: 8.4.5 (api:1/proto:86-101)srcversion: 611D9EEFB9C11D2BC709D07 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----    ns:592 nr:1408 dw:2040 dr:58281 al:6 bm:0 lo:3 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

title: DRBD - Pacemaker HA Cluster Install

category: Linux#

Assumptions are we are running nodes ServerB and ServerA on Ubuntu 16

UPDATE! if running on Ubuntu 18 there is apparently an issue involving the Kernel 5.0 > 5.3 so this is a workaround which can be used in the meantime:

https://launchpad.net/~rafaeldtinoco/+archive/ubuntu/lp1866458sudo add-apt-repository ppa:rafaeldtinoco/lp1866458sudo apt-get updatesudo apt-get dist-upgrade

Update the /etc/hosts file accordingly:

10.253.253.1 ServerA10.253.253.1 ServerB

Run on both:

apt-get update && apt-get install -y drbd8-utils ntp crmsh

Run on ServerB:

dd if=/dev/zero of=/dev/mapper/root-cluster

Edit on both:

/etc/drbd.conf
resource r0 {    on ServerB{        device /dev/drbd0;        disk /dev/mapper/root-cluster;        address 10.253.253.1:7788;        meta-disk internal;    }    on ServerA {        device /dev/drbd0;        disk /dev/mapper/root-cluster;        address 10.253.253.2:7788;        meta-disk internal;        }}

Run on both:

modprobe drbd

Run on both:

drbdadm create-md r0

Run on both:

drbdadm up r0

Run on both:

drbd-overview

Run on both:

cat /proc/drbd

Run on ServerB:

drbdadm -- --overwrite-data-of-peer primary r0/0

will set ServerB as primary and replicate to ServerA

Run on both:

watch cat /proc/drbd

check progress of replication

Run on ServerB:

mkfs.ext4 /dev/drbd0

Run on ServerB:

mkdir -p /mnt/drbd0/

Run on ServerB:

mount /dev/drbd0 /mnt/drbd0/

Up to this point it would be necessary to mount /mnt/drbd0 on primary node and it would replicate to /dev/drbd0 on the other node. But in case of disaster it would be necessary to manually mount /dev/drbd0 on the other node to activate the folder.
From now on we will use Pacemaker and Corosync to do it automatically.

Run on both:

systemctl disable drbd

disable on boot as the pacemaker will control that service

Run on both:

umount /mnt/drbd0/ && #drbdadm down r0

Run on both:

apt-get install -y pacemaker

this includes corosync which synchronizes metadata between nodes

Edit on both:

/etc/corosync/corosync.conf
totem {    version: 2    cluster_name: debian    secauth: off    transport:udpu        interface {            ringnumber: 0            bindnetaddr: 10.253.253.0            broadcast: yes            mcastport: 5405        }}nodelist {    node {        ring0_addr: 10.253.253.1        name: ServerB        nodeid: 1    }    node {        ring0_addr: 10.253.253.2        name: ServerA        nodeid: 2    }}quorum {    provider: corosync_votequorum    two_node: 1    wait_for_all: 1    last_man_standing: 1    auto_tie_breaker: 0}

Run on both:

systemctl restart corosync

to apply config above Run on both:

systemctl start pacemaker

already enabled by default Run on both:

crm status

Run on ServerB:

crm configure edit
node 1: ServerBnode 2: ServerAprimitive drbd_res ocf:linbit:drbd params drbd_resource=r0 op monitor interval=29s role=Master op monitor interval=31s role=Slaveprimitive fortivm VirtualDomain params hypervisor="qemu:///system" config="/mnt/drbd0/fortivm.xml" op stop timeout=120s interval=0 op start timeout=120s interval=0 op monitor interval=30s timeout=30s utilization cpu=1 hv_memory=2048primitive fs_res Filesystem params device="/dev/drbd0" directory="/mnt/drbd0/" fstype=ext4ms drbd_master_slave drbd_res meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=trueorder fs_after_drbd Mandatory: drbd_master_slave:promote fs_res:start fortivm:startcolocation fs_drbd_colo inf: fs_res fortivm drbd_master_slave:Masterproperty cib-bootstrap-options: have-watchdog=false dc-version=1.1.14-70404b0 cluster-infrastructure=corosync cluster-name=debian stonith-enabled=false no-quorum-policy=ignore

Brief description of above:
drbd_res: monitor r0 role master and slave
fortivm: monitor kvm vm
fs_res: mount drbd0 device
drbd_master_slave: only allow one master
fs_drbd_colo: only run resources on drbd master
fs_after_drbd: promote drbd master first, mount dev and start vm

Run on both:

crm status

Add GUI for cluster management(https://github.com/ClusterLabs/pcs):

apt-get install pcs/etc/init.d/pcsd startpasswd haclusterpcs cluster auth ServerA ServerB -u haclusterpcs cluster auth

Sources:


title: DRBD - Pacemaker KVM Setup

category: Linux#

Set up /dev/drdb0 device mounting on /mnt/drdb0, define master and slave and start KVM VM 'fortivm':

crm configure editnode 1: azkabannode 2: peppapigprimitive drbd_res ocf:linbit:drbd \        params drbd_resource=r0 \        op monitor interval=29s role=Master \        op monitor interval=31s role=Slaveprimitive fortivm VirtualDomain \        params hypervisor="qemu:///system" config="/mnt/drbd0/fortivm.xml" \        op stop timeout=120s interval=0 \        op start timeout=120s interval=0 \        op monitor interval=30s timeout=30s \        utilization cpu=1 hv_memory=1024primitive fs_res Filesystem \        params device="/dev/drbd0" directory="/mnt/drbd0/" fstype=ext4ms drbd_master_slave drbd_res \        meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=trueorder fs_after_drbd Mandatory: drbd_master_slave:promote fs_res:start fortivm:startcolocation fs_drbd_colo inf: fs_res fortivm drbd_master_slave:Masterproperty cib-bootstrap-options: \        have-watchdog=false \        dc-version=1.1.14-70404b0 \        cluster-infrastructure=corosync \        cluster-name=debian \        stonith-enabled=false \        no-quorum-policy=ignore

title: DRBD - Pacemaker Test Failover

category: Linux#

The steps described bellow will put a node in standby to force the switchover of resources to another one:

pcs statuspcs cluster standby peppapigpcs statuspcs cluster unstandby peppapig

Sources:


title: DRBD - Split Brain Standalone

category: Linux#

In case of Split Brain cs:StandAlone which can happen at any time:

Run this on secondary node:

drbdadm disconnect r0drbdadm secondary r0 #This set the current node as secondary for r0drbdadm connect --discard-my-data r0

Run this on primary node:

drbdadm disconnect r0drbdadm primary r0 #This set the current node as primary for r0drbdadm connect r0

Sources:


title: DRBD - Stop DRBD Sync for backups

category: Linux#

Steps to run on a secondary node for a cold backup. It will stop the cluster services, bring DRBD up, mount the partition, perform a cold backup and revert the process to bring the cluster services back up:

/etc/init.d/pacemaker stop/etc/init.d/drbd startdrbdadm disconnect r0drbdadm primary r0mount /dev/drbd0 /mnt/drbd0cp /mnt/drbd0/* ...umount /mnt/drbd0/drbdadm secondary r0drbdadm connect --discard-my-data r0drbd-overview/etc/init.d/drbd stop/etc/init.d/pacemaker startsudo crm status