High Availability in Db2 LUW with Pacemaker

Datavail 4 views 30 slides Oct 31, 2025
Slide 1
Slide 1 of 30
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30

About This Presentation

In today’s data-driven world, high availability (HA) is mission-critical. Enterprises cannot afford downtime, yet many organizations seek to avoid the complexity and cost of proprietary HA solutions. Enter Pacemaker — a robust, open-source clustering and high availability framework that brings e...


Slide Content

Boston, MA
High Availability in Db2
LUW with Pacemaker
Scott Konash&Shashi Ranjan, Datavail
Session Code: G01

Agenda
•What is Pacemaker?
•What is Corosync?
•Evolution/Comparison to other Failover/Disaster Recovery solutions.
•Pacemaker deployment options for Db2.
•Supported cloud platforms and business drivers.
•Configuration and implementation in a cloud environment.
•Conclusions and Pacemaker recommendations for administrators.
•Q&A

Datavail at a Glance
Delivering a superior approach to leverage data through application of a
tech-enabled global delivery model & deep specialization in databases,
data management, and application services.
2022
$25
Invested
in IP that improves the service
experience and drives efficiency
M
1,000
Employees
staffed 24x7, resolving over
2,000,000 incidents per year
++
13
Years
building and operating mission
critical data and application systems

What is Pacemaker?
•High-availability cluster manager software that runs on a set of nodes.
•Combined with Corosync (more on this later), detects component failures.
•Orchestrates necessary failover procedures for seamless availability and
business continuity.
•Eliminates the need for Tivoli System Automation (TSA) + RSCT services.
•Cloud-ready for both AWS and Azure.
•Packed with Db2, beginning with Db2 LUW v11.5 Mod 6.

Pacemaker Components (1|4)
•Resources defined in Db2
•Db2 member process
•HADR capable databases
•Ethernet network adapters
•Virtual IP addresses
•Constraints
•Location constraint –where resources can run
•Ordering constraint –order for resource actions to occur
•Co-location constraint –dependencies of one resource on another

Pacemaker Components (2|4)
•Resource set –A group of resources defined over a constraint
•Resource model
•Pre-defined relationship and constraints of all resources.
•Created using the db2cm utility (Pacemaker doesn’t use db2haicu).
•Must be managed using db2cm, as any alteration to this model will render it
unsupported by Db2.

Pacemaker Components (3|4)
•Resource agents
•The Db2 user exits (shell scripts) developed and supported by Db2 to manage the
various resources that are part of the model.
•db2ethmon
•The resource agent to monitor for the defined ethernet network adapter.
•This is at the host level.
•db2inst
•The resource agent to monitor, start and stop the Db2 member process.
•This is at the instance level.
•db2hadr
•The resource agent to monitor, start, and stop individual databases that have been enabled for
HADR.
•This is at the database level.

Pacemaker Components (4|4)
•Cluster domain leader
•One server in the cluster must be designated as the domain leader.
•This server is referred to as the Domain Controller (DC) in Pacemaker.
•The Pacemaker controller daemon residing on the DC will make all cluster decisions.
•If the current domain leader’s host fails, a new domain leader will be elected.

What is Corosync? (1|2)
•Corosync Cluster Engine is an open-source group communication system
software that is utilized by Pacemaker for node management.
•Corosync was founded in January 2008 as a reduction of the OpenAIS
project.
•The OpenAISproject was founded in 2002 to implement Service Availability
Forum Application Interface Specification APIs.
•These APIs were to provide an application framework for high availability
using clustering techniques to reduce MTTR.

What is Corosync? (2|2)
•Corosync offers the following:
•Consistent view of cluster topology.
•Ensure a reliable messaging infrastructure for event ordering on each node.
•Applies quorum constraints.
•Corosync enables servers to communicate as a cluster, while Pacemaker
provides the ability to Db2 to control how the cluster behaves.
•Corosync must be installed and configured prior to installing and
configuring Pacemaker for Db2 failover/HA.

Failover/DR Solutions for Db2 (1|2)
Evolution of Failover/High Availability
•Historically, Db2’s only means of Active/Passive HA was using platform-
dependent clustering (HACMP in AIX, VCS in Solaris, RedHat clusters).
•This implementation was problematic, most notably due to the
administrative overhead and division of responsibilities.
•The combination of HADR and TSA (Tivoli System Automation) then
allowed DBAs to set up and manage their own clusters.
•TSA was again problematic, due to its platform dependence and general
lack of public cloud support and automation.
•Using open-source APIs, Pacemaker solves many of these problems.

Failover/DR Solutions for Db2 (2|2)
Evolution of Disaster Recovery
•Prior to HADR, DBAs would need a home-grown solution of Log Function
Shipping to maintain a Db2 DR database in Rollforward Pending status.
•HADR solved this problem with replication at the log buffer layer, removing
most of the administrative overhead with this setup.
•Multi-target HADR allowed DBAs to use HADR to solve both problems (HA
and DR) without sacrificing one for the other.
•Pacemaker can be implemented to solve both problems in the cloud, with
HA being fully automated and DR being fully managed in one configuration.

Deployment of Pacemaker (1|4)
Supported cloud platforms
•TSA support is available for both AIX and Linux.
•Pacemaker supports Linux (Intel/POWER) as well as z/Linux for on-premises
or locally-virtualized solutions.
•The major advantage of Pacemaker over TSA, and its primary selling point,
is the support for the public cloud.
•Pacemaker supports cluster management and automation in both Amazon
AWS and Microsoft Azure.
•TSAMP, on the other hand, is not supported in the cloud.

Deployment of Pacemaker (2|4)
Business Drivers
•Many, if not all, companies either have, or will be, pursuing a cloud
transformation for all existing legacy systems.
•Most companies are a mixed bag of disparate DBMS’s acquired through
acquisitions or mergers over long periods of time.
•They almost always settle on either AWS or Azure as a cloud direction.
•We’ve seen a trend of customers moving away from Db2 when moving to
public clouds, due to (among other things) a lack of native HA support for
Db2 LUW, often requiring applications to be re-architected.

Deployment of Pacemaker (3|4)
Business Drivers
•Pacemaker makes Db2 in the cloud much more attractive, as it supports
cluster management and automation in both AWS and Azure.
•Our clients often ask why HA is needed in the cloud, as public clouds have
virtualization and fault-tolerance that is inherent in the cloud’s design.
•While cloud technologies are very resilient, they are not without their
problems, including outages and security issues.

Deployment of Pacemaker (4|4)
Business Drivers
•Without redundancy across availability zones and regions in the cloud, a
hardware or software failure can result in hours of downtime.
•From a business continuity standpoint, with critical production databases,
our clients need to ensure that their applications will stay up and running.
•HA automation is all about seamless business continuity.

Configuration and Implementation in the Cloud (1|11)
Prerequisites
•At a high level, the following is required for Pacemaker to operate:
•Db2 LUW v11.5.6 or above for the databases.
•RHEL 8.1+, or SuSE15 SP1+ for the linuxdistributions, with SELINUX disabled.
•A VIP (load balancer) in the same subnet as both the primary/standby servers.
•A separate quorum host (Qdevice) to install Corosync software (does not need Db2
installed). One quorum device can manage multiple clusters.
•At least 1GB in /var to store cluster log files.
•At least 150mb in /usrfor RHEL, or 300mb in /usrfor SUSE.
•KornShell(ksh) and python3-dnf-plugin-versionlock packages are required.

Configuration and Implementation in the Cloud (2|11)
Prerequisites (cont.)
•Ensure that the following packages are installed on all hosts:
•rpm -q corosync
•rpm -q pacemaker
•rpm -q crmsh
•Make sure firewall ports are open for HADR/SVCE ports, as well as ports
3121 (crmd), 5403 (corosync-qnetd), 5404-5405 (corosync).
•All nodes in the cluster, as well as the load balancer VIP, are defined on
each server’s /etc/hosts file.
•Passwordlesssshfor root and instance owner id’s across all servers.

Configuration and Implementation in the Cloud (3|11)
High-level Implementation Steps
•Meet all prerequisites from prior slides.
•Create and set up two EC2 instances or two Azure VMs for Db2.
•Create a third EC2 instance/VM of minimal specs for Qdevice.
•Create a load balancer/VIP (steps vary between AWS and Azure).
•Set up cluster and resource model using db2cm.
•Set up the quorum server to act as an arbitrator for the nodes.

Configuration and Implementation in the Cloud (4|11)
Cluster Configuration for Pacemaker
•Create cluster using db2cm:
•/home/db2inst1/sqllib/bin/db2cm -create -cluster -domain <domain_name> -host
<host1> -publicEtherneteth0 -host <host2> -publicEtherneteth0
•Create the instance resource and its resource model:
•/home/db2inst1/sqllib/bin/db2cm -create -instance db2inst1 -host <host1>
•/home/db2inst1/sqllib/bin/db2cm -create -instance db2inst1 -host <host2>
•Add HADR database resources to the resource model:
•/home/db2inst1/sqllib/bin/db2cm -create -db<database> -instance db2inst1
•Check status of cluster:
•/home/db2inst1/sqllib/bin/db2cm -list

Configuration and Implementation in the Cloud (5|11)
Virtual IP and Load Balancer Setup (Azure)
•Load balancer and VIP need to be created in Azure from the console (in this
example, we will use port 62500).
•Create Virtual IP primitive on one node of the cluster:
•/home/db2inst1/sqllib/bin/db2cm -create -primaryVIP<virtual_ip> -db<database> -
instance db2inst1
•Create the Load Balancer primitive on one node of the cluster:
•crm configure primitive db2_db2inst1_db2inst1_<db_name>-primary-lblazure-lb
port=62500 meta op start interval=0 op_paramstrace_ra=1 op stop interval=0
op_paramstrace_ra=1

Configuration and Implementation in the Cloud (6|11)
Virtual IP and Load Balancer Setup Continued (Azure)
•Create the collocation and order constraints for the Load Balancer:
•crm configure colocation db2_db2inst1_db2inst1_<db_name>-primary-lbl-colocation
inf: db2_db2inst1_db2inst1_<db_name>-primary-lbl:Started
db2_db2inst1_db2inst1_<db_name>-clone:Master
•crm configure order order-rule-db2_db2inst1_db2inst1_<db_name>-then-primary-
lblMandatory: db2_db2inst1_db2inst1_<db_name>-clone:promote
db2_db2inst1_db2inst1_<db_name>-primary-lbl:start
•Start the Load Balancer resource:
•crm resource manage db2_db2inst1_db2inst1_<db_name>-primary-lbl
•crm resource start db2_db2inst1_db2inst1_<db_name>-primary-lbl

Configuration and Implementation in the Cloud (7|11)
Quorum and QdeviceSetup (Azure)
•A Qdeviceserver is essentially the tiebreaker for the cluster, which requires
an external resource accessible to all nodes of the cluster.
•It is more reliable than a simple Quorum IP, as the decision logic is more
robust that being a simple TCP/IP ping to the external address.
•It is considerably simpler to set up, as it can be a light-weight server with
no database installation, and does not need to be part of the Pacemaker
quorum.

Configuration and Implementation in the Cloud (8|11)
Quorum and QdeviceSetup Continued (Azure)
•Quorum host can be a
minimalistic server,
with the only requirement
being the coronet qnetdemon.
•The quorum host must be
pingable from both Host1
and Host2.
•Host3 (quorum) acts as the
3rd party arbitrator’s decision
maker.

Configuration and Implementation in the Cloud (9|11)
Quorum and QdeviceSetup Continued (Azure)
•Ensure the corosync-qdevicepackage is installed on primary and standby
•Esurethe corosync-qnetdpackage is installed on the arbitrator node
•Configure quorum on one of the database nodes as root:
•/opt/ibm/db2/V11.5/bin/db2cm -create -qdevice<quorum_node>
•Verify quorum setup on primary/standby:
•[root@<host1> ~]# corosync-qdevice-tool -s
•[root@<host2> ~]# corosync-qdevice-tool -s
•Verify quorum setup on arbitrator:
•[root@<quorum_host># corosync-qdevice-tool -l

Configuration and Implementation in the Cloud (10|11)
Further Reading
•Command Reference:
•https://www.ibm.com/docs/en/db2/11.5?topic=pacemaker-db2cm-db2-cluster-
manager-utility
•Quorum Information:
•https://www.ibm.com/docs/hr/db2/11.5?topic=utility-install-configure-qdevice-
quorum
•https://www.ibm.com/docs/en/db2/11.5?topic=component-quorum-devices-
support-pacemaker

Configuration and Implementation in the Cloud (11|11)
Video Demonstration (10 minutes)
•We will now show a
hands-on demonstration.
•We will use Pacemaker
for both Failover and
Disaster Recovery.
•All 4 nodes will be
managed by Pacemaker
using this architecture.

Conclusion
Summary
•Pacemaker solves a problem that many companies have with their cloud
transformation endeavors, how to address high availability and business
continuity with Db2 when lifting and shifting into the cloud.
•From our experience, Pacemaker tends to be a lot more automated and robust,
and a lot less prone to problems than TSA clusters implemented with RSCT.
•Failover performance is greatly improved with Pacemaker over TSA, which can
reduce mean time to recovery (MTTR) in many applications.
•When combined with disaster recovery, Pacemaker can manage both primary and
DR site clusters of servers in one resource model.
•When a takeover to a DR site is done with this setup, Pacemaker automatically
and seamlessly updates the status of the various resources in the model to reflect
their current roles, eliminating the need for any manual reconfiguration.

Questions

Thank You
Speaker: Scott Konash& Shashi Ranjan
Company: Datavail
Email Address: [email protected]
[email protected]
Session Code: G01
Please fill out your session evaluation before leaving!