Using HP Serviceguard with Caché

Caché can be configured as a resource controlled by HP Serviceguard. This appendix highlights the key portions of the configuration of Serviceguard including how to incorporate the custom Caché resource. Refer to your HP documentation and consult with HP on all cluster configurations.

When using Caché in a high availability environment controlled by HP Serviceguard:

Install the hardware and operating system according to your vendor recommendations for high availability, scalability and performance; for more information, see Hardware Configuration.
Configure HP Serviceguard to control a simple shared disk resource with a virtual IP address and test that common failures are detected and the cluster continues operating; for more information, see HP-UX and HP Serviceguard Configuration.
Install Caché according to the guidelines in this appendix and test connectivity to the database from your application; for more information, see Install Caché in the Cluster.
Test disk failures, network failures, and system crashes. Test and understand your application’s response to such failures; for more information, see Test and Maintenance.

Hardware Configuration

Configure the hardware according to best practices for the application. In addition to adhering to the recommendations of HP and your hardware vendor, consider the following:

Disk and Storage — Create LUNs/partitions, as required, for performance, scalability, availability and reliability. This includes using appropriate RAID levels, battery-backed and mirrored disk controller cache, multiple paths to the disk from each node of the cluster, and a partition on fast shared storage for your cluster quorum disk.

Note:

A quorum disk is recommended to prevent a split cluster.
Networks/IP Addresses — Use bonded multi-NIC connections through redundant switches/routers where possible to reduce single-points-of-failure. Include all fully-qualified public and private domain names in the hosts file on each node.

HP-UX and HP Serviceguard Configuration

Prior to installing the Caché-based application, follow the recommendations described below when configuring HP-UX and Serviceguard. These recommendations assume a two-node cluster with quorum disk.

HP-UX

When you configure HP-UX, ensure that:

All userids and groupids required for Caché are identical on all nodes.
Sufficient kernel resources (shmmax, etc.) are configured on the nodes.
All volume groups required are imported on all nodes.

HP Serviceguard

This appendix assumes you are using HP Serviceguard version 11.18 or later.

Note:

The disk monitor service is available with HP Serviceguard version 11.20 and the cmvolmond daemon.

Caché is configured as an external module in a failover package whose configuration typically has the following attributes:

package_type                  failover
auto_run                      yes
node_fail_fast_enabled        yes
failover_policy               configured_node
failback_policy               manual
local_lan_failover_allowed    yes

The procedures in this appendix assume the following configuration choices:

Package Configuration Name: cachepkg.conf
Package Name: cachepkg
Service Name: cachesvc
Caché instance controlled by this package and monitored by this service: cacheprod
Caché install directory: /cacheprod/cachesys/

In general, do the following:

Edit the cache.sh script’s inst variable to be equal to the Caché instance being controlled.
Copy the cache.sh file to the /etc/cmcluster/<pkg>/ directory, where <pkg> is the package name (for example: /etc/cmcluster/cachepkg/).
Ensure the permissions, owner, and group allow this script to be executable by the Serviceguard daemons.

Note:

InterSystems provides a sample HP Serviceguard service script (cache.sh) as part of a development installation of Caché. The sample script is located in the dev/cache/HAcluster/HP/ subdirectory under the Caché installation directory. It is not necessary to do a development installation in the cluster; copy the dev/cache/HAcluster/HP/cache.sh file to all cluster members from any developer’s installation, then modify as described in the previous procedure.

Install Caché in the Cluster

This section describes the additional steps required when installing Caché in a shared-disk cluster environment; they assume a two-node cluster that has never had Caché installed:

Installing a single instance of Caché in the cluster is common in production clusters. In development and test clusters it is common to have multiple instances of Caché controlled by the cluster software. If it is possible that you will install multiple instances of Caché in the future, follow the procedure for multiple instances.

Note:

For information about upgrading Caché in an existing failover cluster, see Upgrading a Cluster in the “Upgrading Caché” chapter of the Caché Installation Guide.

Installing a Single Instance of Caché in the Cluster

To install a single instance of Caché and your application, use the following procedure.

Note:

If any Caché instance that is part of a failover cluster is to be added to a Caché mirror, you must use the procedure described in Installing Multiple Instances of Caché in the Cluster, rather than the procedure in this section.

Mount the entire set of disks required for Caché and the application to run (for installation, data files, journal files, and any other disk required for the application in use) on one node.
Create a link from /usr/local/etc/cachesys to the shared disk. This forces the Caché registry and all supporting files to be stored on the shared disk resource. A good choice is to use a ./usr/local/etc/cachesys/ subdirectory under your intended install directory.
Assuming Caché will be installed in the /cacheprod/cachesys/ subdirectory, create the following directories and links on the node that has mounted the shared disk:
```
mkdir –p /cacheprod/cachesys/usr/local/etc/cachesys
mkdir –p /usr/local/etc/
ln –s /cacheprod/cachesys/usr/local/etc/cachesys/ /usr/local/etc/cachesys
```
Run Caché cinstall on the node with the mounted disks, ensuring the required users and groups (either default or custom) are available on all nodes in the cluster, and that they all have the same UID and GIDs.
Stop Caché and dismount the shared disks.
Configure the volume group and file system sections of your cluster configuration. Ensure the fs_opts for your journal file systems match current file system mount recommendations.
Configure the virtual IP address ensuring the application, all interfaces, any ECP clients, etc. use the virtual IP address as configured here to connect to the Caché resource.
Start the package and relocate the package to the second node.

Note:

At this point there is no control of Caché in the cluster configuration.

On the second node in the cluster, create the link in /usr/bin/etc/ and the links in /usr/bin for ccontrol and csession:

mkdir -p /usr/local/etc/
ln –s /cacheprod/cachesys/usr/local/etc/cachesys/ /usr/local/etc/cachesys

ln –s /usr/local/etc/cachesys/ccontrol /usr/bin/ccontrol 
ln –s /usr/local/etc/cachesys/csession /usr/bin/csession

Test starting and stopping Caché via the script:

cd /etc/cmcluster/cachepkg/
./cache.sh start
./cache.sh stop

Add the Caché service to be monitored. The service script acts as the monitor of the Caché instance:
```
service_name          cachesvc
service_cmd           "/etc/cmcluster/cachepkg/cache.sh monitor"
```
Add the Caché module by adding the path of the cache.sh script as an external_script:
```
external_script        /etc/cmcluster/cachepkg/cache.sh
```
Note:

Serviceguard appends a stop or a start parameter to the cache.sh script as appropriate.
Test manual package start, stop, and relocation.

Installing Multiple Instances of Caché in the Cluster

To install multiple instances of Caché, use the following procedure.

Note:

For information about strategies for deploying multiple instances of Caché in HP Serviceguard clusters, please contact the InterSystems Worldwide Response Center (WRC)Opens in a new tab.

If any Caché instance that is part of a failover cluster is to be added to a Caché mirror, the ISCAgent (which is installed with Caché) must be properly configured; see Configuring the ISCAgent in the “Mirroring” chapter of this guide for more information.

To allow for installing multiple instances of Caché and your application, follow these steps during initial cluster configuration:

Mount the entire set of disks required for Caché and the application to run (for installation, data files, journal files, and any other disk required for the application in use) on one node.
Run Caché cinstall on the node with the mounted disks, ensuring the required users and groups (either default or custom) are available on all nodes in the cluster, and that they all have the same UID and GIDs.
The /usr/local/etc/cachesys directory and all its files must be available to all nodes at all times. Therefore, after Caché is installed on the first node, copy /usr/local/etc/cachesys to each node in the cluster, as follows:
```
cd /usr/local/etc/
scp –p cachesys node2:/usr/local/etc/
```
Note:

Keep the Caché registries on all nodes in sync using ccontrol create and/or ccontrol update; for example:
```
ccontrol update CACHEPROD directory=/cacheprod/cachesys/ versionid=2012.2.4.500.0
```
Verify that ownerships and permissions on this cachesys directory and its files are identical on all nodes.
Stop Caché and dismount the shared disks.
Configure the volume group and file system sections of your cluster configuration. Ensure the fs_opts for your journal file systems match current file system mount recommendations.
Configure the virtual IP address ensuring the application, all interfaces, any ECP clients, etc. use the virtual IP address as configured here to connect to the Caché resource.
Start the package and relocate the package to the second node.

On the second node in the cluster, create links in /usr/bin for ccontrol and csession:

ln –s /usr/local/etc/cachesys/ccontrol /usr/bin/ccontrol 
ln –s /usr/local/etc/cachesys/csession /usr/bin/csession

Test starting and stopping Caché via the script:

cd /etc/cmcluster/cachepkg/
./cache.sh start
./cache.sh stop

Add the Caché service to be monitored. The service script acts as the monitor of the Caché instance:
```
service_name          cachesvc
service_cmd           "/etc/cmcluster/cachepkg/cache.sh monitor"
```
Add the Caché module by adding the path of the cache.sh script as an external_script:
```
external_script        /etc/cmcluster/cachepkg/cache.sh
```
Note:

Serviceguard appends a stop or a start parameter to the cache.sh script as appropriate.
Test manual package start, stop, and relocation.

Special Considerations

Consider the following for your applications:

Ensure that ALL disks required for the proper operation of your application are part of the cachepkg.conf file.
Ensure that all network ports required for interfaces, user connectivity, and monitoring are open on all nodes in the cluster.
Connect all interfaces, web servers, ECP clients and users to the database using the virtual IP address over the public network as configured in the cachepkg.conf file.
Ensure that the application daemons, Ensemble productions, etc. are set to auto-start so the application is fully available to users after unscheduled failovers.
Consider carefully any code that is part of %ZSTART or otherwise occurs as part of the Caché startup. To minimize time-to-recovery do not place heavy cleanup or query code in the startup.
By default, the cache.sh script is configured to use ccontrol force whenever the script is called with the stop parameter; this results in a very fast stop and no hangs, or waits for processes to quit. Caché rolls back transactions during the next Caché start. This is equivalent to a hard crash and recovery on a single node. If the application running in Caché is prone to long transactions, the default behavior can be changed.
To configure cache.sh to try and wait for a ccontrol stop before a force, edit cache.sh and set cleanstop=1.

Note:

cleanstop=1 can result in a longer time to recover after unplanned failovers.

In any case, administrators performing planned service relocations should begin with a controlled halt of Caché using ccontrol stop. Then, after a clean and successful stop, they can continue with the service relocation.

Test and Maintenance

Upon first setting up the cluster, be sure to test that failover works as planned. Any time changes are made to the operating system, its installed packages, the disk, the network, Caché, or your application, be sure to test that failover continues to work as expected.

In addition to the topics described in this section, you should contact the InterSystems Worldwide Response Center (WRC)Opens in a new tab for assistance when planning and configuring your HP Serviceguard service to control Caché. The WRC can check for any updates to the cache.sh script, as well as discuss failover and HA strategies.

Typical Full Scale Testing Must Go Beyond a Controlled Service Relocation

While service relocation testing is necessary to validate that the package configuration and service scripts are all functioning properly, be sure to also test response to simulated failures.

Be sure to test failures such as:

Loss of public and private network connectivity from the active node.
Loss of disk connectivity.
Hard crash of active node.

Testing should include a simulated or real application load, as follows:

Testing with a load builds confidence that the application will recover.
Try to test with a heavy disk write load. During heavy disk writes the database is at its most vulnerable. Caché handles all recovery automatically using its CACHE.WIJ and journal files, but testing a crash during an active disk write ensures that all file system and disk devices are properly failing over.

Keep Patches and Firmware Up to Date

Avoid known problems by adhering to a patch and update schedule.

Use Caché Monitoring Tools

Use the Caché console log, the Caché Monitor and the Caché System Monitor to be alerted to problems with the database that may not be caught by the cluster software. (See the chapters “Monitoring Caché Using the Management Portal”, “Using the Caché Monitor” and “Using the Caché System Monitor” in the Caché Monitoring Guide for information about these tools.)