Using IBM PowerHA SystemMirror with Caché

Caché can be configured as a resource controlled by IBM PowerHA SystemMirror. This appendix highlights the key portions of the configuration of PowerHA including how to incorporate the custom Caché application controller and monitor script. Refer to your IBM documentation and consult with IBM on all cluster configurations.

When using Caché in a high availability environment controlled by IBM PowerHA SystemMirror:

Install the hardware and operating system according to your vendor recommendations for high availability, scalability and performance; for more information, see Hardware Configuration.
Configure IBM PowerHA SystemMirror with shared disk and virtual IP. Verify that common failures are detected and the cluster continues operating; for more information, see IBM PowerHA SystemMirror Configuration.
Install Caché according to the guidelines in this appendix and verify connectivity to your application via the virtual IP; for more information, see Install Caché in the Cluster.
Test disk failures, network failures, and system crashes. Test and understand your application’s response to such failures; for more information, see Test and Maintenance.

Hardware Configuration

Configure the hardware according to best practices for the application. In addition to adhering to the recommendations of IBM and your hardware vendor, consider the following:

Disk and Storage — Create LUNs/partitions, as required, for performance, scalability, availability and reliability. This includes using appropriate RAID levels, battery-backed and mirrored disk controller cache, multiple paths to the disk from each node of the cluster, and a partition on fast shared storage for the PowerHA cluster repository disk.
Networks/IP Addresses — Use bonded multi-NIC connections through redundant switches/routers where possible to reduce single-points-of-failure.

IBM PowerHA SystemMirror Configuration

Prior to installing Caché and your Caché-based application, follow the recommendations described in this section when configuring IBM AIX® and PowerHA SystemMirror.

Note:

These recommendations assume a two-node cluster in which both nodes are identical. If your configuration is different, consult with IBM and the InterSystems Worldwide Response Center (WRC)Opens in a new tab for guidance.

IBM AIX®

When configuring AIX® on the nodes in the cluster, ensure that:

All userids and groupids required for Caché are identical on all nodes.
All volume groups required for Caché and the application are available to all nodes.
To allow for “Parallel” recovery after failures, do not use hierarchical file systems mount points (/prod, /prod/db1 and /prod/db2); instead use independent mount points: /proddb1, /proddb2, /proddb3, etc.
To allow for the faster “logredo” method of file system consistency checking during startup, be sure to use JFS2 or other journaled file systems.
Include all fully qualified public and private domain names in the hosts file on each node.

Note:

PowerHA 7 automatically uses all network paths (public and private, as well as shared disk paths) and the cluster repository disk for the cluster heartbeat.

IBM PowerHA SystemMirror

This appendix assumes you are using IBM PowerHA SystemMirror version 7.1.0 or later.

Note:

Older versions also work, but they may have slightly different configuration options. If your version of IBM PowerHA SystemMirror is earlier than 7.1.0, consult with IBM and the InterSystems Worldwide Response Center (WRC)Opens in a new tab for guidance.

InterSystems provides a sample Caché application and monitor script as part of a development install of Caché. The sample script can be found in the dev/cache/HAcluster/IBM/ subdirectory under the Caché install directory after completing a development install. There is no requirement to do a development install in the cluster. Copy the dev/cache/HAcluster/IBM/cache.sh script to all cluster members from any development install. Make sure the ownerships and permissions on the copied cache.sh file are as required by PowerHA SystemMirror. This script should be sufficient for most cluster configurations but should be tested and may require modification to suit unique cluster topologies.

The procedures in this appendix assume the following configuration choices:

Resource Group Name: cacheprod_rg
Application Controller: cacheprod
Caché instance controlled by the above Application Controller: cacheprod
Caché install directory: /cacheprod/cachesys/
Location and name of Caché application and monitor script: /etc/cluster/cacheprod/cache.sh

In general, do the following:

Install and cable all hardware, including disks and network.
Create a resource group (cacheprod_rg) that includes the network paths and volume groups of the shared disk. To configure the resource group properly:
- Include the entire set of volume groups, logical volumes and mount points required for Caché and the application to run. These include those mount points required for the main Caché installation location, your data files, journal files, and any other disk required for the application in use.
- Configure the following policies as specified:
  - Startup policy: Online On First Available Node
  - Failover policy: Failover To Next Priority Node In The List
  - Fallback policy: Never Fallback
- When adding file system resources to the resource group, configure the following file system settings as specified:
  - File systems Consistency Check: logredo
  - File systems Recovery Method: parallel

Install Caché in the Cluster

After the resource group has been created and configured, install Caché in the cluster. The procedure is outlined below.

There are different procedures, depending on whether you are installing only one instance of Caché or multiple instances of Caché, as described in the following subsections:

Installing a single instance of Caché in the cluster is common in production clusters. In development and test clusters it is common to have multiple instances of Caché controlled by the cluster software. If it is possible that you will install multiple instances of Caché in the future, follow the procedure for multiple instances.

Note:

For information about upgrading Caché in an existing failover cluster, see Upgrading a Cluster in the “Upgrading Caché” chapter of the Caché Installation Guide.

Installing a Single Instance of Caché in the Cluster

Use the following procedure to install and configure a single instance of Caché and your application.

Note:

If any Caché instance that is part of a failover cluster is to be added to a Caché mirror, you must use the procedure described in Installing Multiple Instances of Caché in the Cluster, rather than the procedure in this section.

Bring the Caché resource group (cacheprod_rg) online on one node. This mounts all required disks, allowing for the proper installation of Caché:
1. Check the file and directory ownerships and permissions on all mount points and subdirectories.
2. Prepare to install Caché by reviewing the “Installing Caché on UNIX® and Linux” chapter of the Caché Installation Guide.
Create a link from /usr/local/etc/cachesys to the shared disk. This forces the Caché registry and all supporting files to be stored on the shared disk resource configured as part of cacheprod_rg.
A good choice is to use a ./usr/local/etc/cachesys/ subdirectory under your install directory. For example, assuming Caché is installed in the /cacheprod/cachesys/ subdirectory, specify the following on all nodes in the cluster:
```
mkdir –p /cacheprod/cachesys/usr/local/etc/cachesys
mkdir –p /usr/local/etc/
ln –s /cacheprod/cachesys/usr/local/etc/cachesys/ /usr/local/etc/cachesys
```
Run Caché cinstall on the node with the mounted disks.

Important:

Be sure the users and groups (either default or custom) have already been created on all nodes in the cluster, and that they all have the same UID and GIDs.
Stop Caché and move the resource group to the other nodes.

On the other node in the cluster, create the link in /usr/local/etc and the links in /usr/bin for ccontrol and csession:

mkdir -p /usr/local/etc/
ln –s /cacheprod/cachesys/usr/local/etc/cachesys/ /usr/local/etc/cachesys 

ln –s /usr/local/etc/cachesys/ccontrol /usr/bin/ccontrol 
ln –s /usr/local/etc/cachesys/csession /usr/bin/csession

Test connectivity via the virtual IP address.

Important:

Be sure the application, all interfaces, any ECP clients, etc. use the virtual IP address as configured here to connect to the Caché resource.
Place the cache.sh file in /etc/cluster/<app>/, where <app> is the application controller name (for more information, see Application Controllers and Monitors in this appendix). Ensure the permissions, owner and group allow this script to be executable.
Test that the cache.sh script stops and starts the newly installed Caché instance. Assuming the application controller is named cacheprod:
- To test a start of the cache instance of Caché run: /etc/cluster/cacheprod/cache.sh start cacheprod
- To test a stop of the cache instance of Caché run: /etc/cluster/cacheprod/cache.sh stop cacheprod
Offline the resource group to prepare to add the control scripts and monitors to your resource group; for more information, see Application Controllers and Monitors in this appendix.

Installing Multiple Instances of Caché in the Cluster

To install multiple instances of Caché and your application, use the following procedure.

Note:

If any Caché instance that is part of a failover cluster is to be added to a Caché mirror, the ISCAgent (which is installed with Caché) must be properly configured; see Configuring the ISCAgent in the “Mirroring” chapter of this guide for more information.

Bring the Caché resource group online on one node. This mounts all required disks and allows for the proper installation of Caché:
1. Check the file and directory ownerships and permissions on all mount points and subdirectories.
2. Prepare to install Caché by reviewing the “Installing Caché on UNIX® and Linux” chapter of the Caché Installation Guide.
Run Caché cinstall on the node with the mounted disks,

Important:

Be sure the users and groups (either default or custom) have already been created on all nodes in the cluster, and that they all have the same UID and GIDs.
The /usr/local/etc/cachesys directory and all its files must be available to all nodes at all times. Therefore, after Caché is installed on the first node, copy /usr/local/etc/cachesys to each node in the cluster, as follows:
```
cd /usr/local/etc/
scp –p cachesys node2:/usr/local/etc/
```
Verify that ownerships and permissions on this cachesys directory and its files are identical on all nodes.
Note:

Keep the Caché registries on all nodes in sync using ccontrol create and/or ccontrol update after Caché upgrades; for example:
```
ccontrol create CSHAD directory=/myshadow/ versionid=2011.1.475
```
Stop Caché and move the resource group to the other nodes.

On the other nodes in the cluster, create links in /usr/bin for ccontrol and csession:

ln –s /usr/local/etc/cachesys/ccontrol /usr/bin/ccontrol 
ln –s /usr/local/etc/cachesys/csession /usr/bin/csession

Test connectivity via the virtual IP address. Be sure the application, all interfaces, any ECP clients, etc. use the virtual IP address as configured here to connect to the Caché resource.
Place the cache.sh file in /etc/cluster/<app>/, where <app> is the application controller name (for more information, see Application Controllers and Monitors in this appendix). Ensure the permissions, owner and group allow this script to be executable.
Test that the cache.sh script stops and starts the newly installed Caché instance. Assuming the application controller is named cacheprod:
- To test a start of the cache instance of Caché run: /etc/cluster/cacheprod/cache.sh start cacheprod
- To test a stop of the cache instance of Caché run: /etc/cluster/cacheprod/cache.sh stop cacheprod
Offline the resource group to prepare to add the control scripts and monitors to your resource group; for more information, see Application Controllers and Monitors in this appendix.

Application Controllers and Monitors

Once Caché is installed, configure the application controller resource. For example, assuming the resource group that contains all the disk and IP resources is called cacheprod_rg and the chosen Application Controller Name is cacheprod, you would configure the following in the application controller:

Application Controller Name: cacheprod
Start script: /etc/cluster/cacheprod/cache.sh start cacheprod
Stop script: /etc/cluster/cacheprod/cache.sh stop cacheprod
Resource Group Name: cacheprod_rg

You can also configure the optional custom application monitors:

Note:

If you configure a long-running monitor, you must also configure startup monitor.

Long-running Monitor

Configure the long-running monitor with the following values:

Application Controller Name: cacheprod
Monitor Mode: Long-running monitoring
Monitor Method: /etc/cluster/cacheprod/cache.sh monitor cacheprod
Monitor Interval: 21
Stabilization Interval: 60
Restart Count: 1
Restart Interval: default
Action on Application Failure: failover
Notify Method: (site specific)
Cleanup Method: default
Restart Method: default

A long-running monitor uses the monitor function of the cache.sh script. This function, in turn, performs a ccontrol list to determine the status of the application:

If the status is “running” or “down”, the monitor returns success.
If the status is an error, PowerHA attempts to stop and then start the Application Controller.
For any other status, the monitor hangs for 10 seconds and checks again. Then, if the application is still not running or is down, the monitor returns an error and forces PowerHA to go through an application restart cycle on this node. If that fails to successfully restart Caché, PowerHA fails the resource group to the other node.

Startup Monitor

Configure the startup monitor with the following values:

Application Controller Name: cacheprod
Monitor Mode: Startup monitoring
Monitor Method: /etc/cluster/cacheprod/cache.sh startmonitor cacheprod
Monitor Interval: 11
Stabilization Interval: 21
Restart Count: 1
Restart Interval: default
Action on Application Failure: notify
Notify Method: (site specific)
Cleanup Method: default
Restart Method: default

The startup monitor uses the startmonitor function of the cache.sh script. This function, in turn, performs a ccontrol list to determine the status of the application:

If the status is “running,” the Application Controller quits since the application is already running.
For any other status, the Application Controller exits with a failure, which causes the Application Controller to run the defined Start script to initiate the start of Caché.

Application Considerations

Consider the following for your applications:

Ensure that all network ports required for interfaces, user connectivity, and monitoring are open on all nodes in the cluster.
Connect all interfaces, web servers, ECP clients and users to the database using the virtual IP address over the public network.
Ensure that the application daemons, Ensemble productions, etc. are set to auto-start so the application is fully available to users after unscheduled failovers.
Consider carefully any code that is part of %ZSTART or otherwise occurs as part of the Caché startup. To minimize time-to-recovery do not place heavy cleanup or query code in the startup such that PowerHA times out before custom code completes.

Test and Maintenance

Upon first setting up the cluster, be sure to test that failover continues to work as planned. This also applies any time changes are made to the operating system, its installed packages, the disk, the network, Caché, or your application.

In addition to the topics described in this section, you should contact the InterSystems Worldwide Response Center (WRC)Opens in a new tab for assistance when planning and configuring your IBM PowerHA SystemMirror resource to control Caché. The WRC can check for any updates to the cache.sh, as well as discuss failover and HA strategies.

Typical Full Scale Testing Must Go Beyond a Controlled Service Relocation

While service relocation testing is necessary to validate that the package configuration and the service scripts are all functioning properly, be sure to also test response to simulated failures.

Be sure to test failures such as:

Loss of public and private network connectivity from the active node.
Loss of disk connectivity.
Hard crash of active node.

Testing should include a simulated or real application load, as follows:

Testing with a load builds confidence that the application will recover.
Try to test with a heavy disk write load. During heavy disk writes the database is at its most vulnerable. Caché handles all recovery automatically using its CACHE.WIJ and journal files but testing a crash during an active disk write ensures that all file system and disk devices are properly failing over.

Keep Patches and Firmware Up to Date

Avoid known problems by adhering to a patch and update schedule.

Use Caché Monitoring Tools

Keep an eye on the IBM PowerHA hacmp.out file. The Caché cache.sh script logs time-stamped information to this file during cluster events. To troubleshoot any problems, search for the phrases “CACHE ERROR” and/or “CACHE INFO” in the hacmp.out log to quickly find the period during which the cache.sh script was executing.

Use the Caché console log, the Caché Monitor and the Caché System Monitor to be alerted to problems with the database that may not be caught by the cluster software. (See the chapters “Monitoring Caché Using the Management Portal”, “Using the Caché Monitor” and “Using the Caché System Monitor” in the Caché Monitoring Guide for information about these tools.)