Using Veritas Cluster Server for Linux with Caché

Caché can be configured as an application controlled by Veritas Cluster Server (VCS) on Linux. This appendix highlights the key portions of the configuration of VCS including how to incorporate the Caché high availability agent into the controlled service. Refer to your Veritas documentation and consult with your hardware and operating system vendor(s) on all cluster configurations.

When using Caché in a high availability environment controlled by Veritas Cluster Server:

Install the hardware and operating system according to your vendor recommendations for high availability, scalability and performance; see Hardware Configuration.
Configure VCS with shared disks and a virtual IP (VIP). Verify that common failures are detected and the cluster continues operating; see Linux and Veritas Cluster Server.
Install the VCS control scripts (online, offline, clean, monitor) and the Caché agent type definition, see Installing the VCS Caché Agent.
Install Caché and your application according to the guidelines in this appendix and verify connectivity to your application through the VIP; see Installing Caché in the Cluster.
Test disk failures, network failures, and system crashes, and test and understand your application’s response to such failures; see Application Considerations and Testing and Maintenance.

Hardware Configuration

Configure the hardware according to best practices for your application. In addition to adhering to the recommendations of your hardware vendor, consider the following:

Disk and Storage

Create LUNs/partitions, as required, for performance, scalability, availability and reliability. This includes using appropriate RAID levels, battery-backed and mirrored disk controller cache, multiple paths to the disk from each node of the cluster, and a partition on fast shared storage for the cluster quorum disk.

Networks/IP Addresses

Where possible, use bonded multi-NIC connections through redundant switches/routers to reduce single-points-of-failure.

Linux and Veritas Cluster Server

Prior to installing Caché and your Caché-based application, follow the recommendations described below when configuring Linux and VCS. These recommendations assume a two-node cluster where both nodes are identical. Other configurations are possible; consult with your hardware vendor and the InterSystems Worldwide Response Center (WRC)Opens in a new tab for guidance.

Linux

When configuring Linux on the nodes in the cluster, use the following guidelines:

All nodes in the cluster must have identical userids/groupids (that is, the name and ID number must be identical on all nodes); this is required for Caché.
These two users and two groups need to be added and synchronized between members:
1. Users
  1. Owner(s) of the instance(s) of Caché
  2. Effective user(s) assigned to each instance’s Caché jobs
2. Groups
  1. Effective group(s) to which each instance’s Caché processes belong.
  2. Group(s) allowed to start and stop the instance(s).
All volume groups required for Caché and the application are available to all nodes.
Include all fully qualified public and private domain names in the hosts file on each node.

Veritas Cluster Server

This document assumes Veritas Cluster Server (VCS) version 5.1 or newer. Other versions may work as well, but likely have different configuration options. Consult with Symmantec/Veritas and the InterSystems Worldwide Response Center (WRC)Opens in a new tab for guidance.

In general you will follow these steps:

Install and cable all hardware, disk and network.
Create a cluster service group that includes the network paths and volume groups of the shared disk.

Be sure to include the entire set of volume groups, logical volumes and mount points required for Caché and the application to run. These include those mount points required for the main Caché installation location, your data files, journal files, and any other disk required for the application in use.

Installing the VCS Caché Agent

The Caché VCS agent consists of five files and one soft link that must be installed on all servers in the cluster.

Sample Caché VCS agent scripts and type definition are included in a development install. These samples will be sufficient for most two-node cluster installations. Follow the instructions provided for copying the files to their proper locations in the cluster.

A development install is not required in the cluster. The files listed in the following can be copied from a development install outside the cluster to the cluster.

Assuming a development install has been completed to the /cachesys directory, the following files are located in /cachesys/dev/cache/HAcluster/VCS/Linux/:

CacheTypes.cf

Definition of the Caché agent

clean

Script that is run if VCS cannot complete an offline or online event

monitor

Monitor run to check if Caché is marked up or down or other

offline

Script to take Caché down

online

Script to bring Caché up

On all cluster nodes, create the directory to hold the files associated with the Caché agent:
```
cd /opt/VRTSvcs/bin/
mkdir Cache
```

Create the link from the Caché agent to the VCS Script51Agent binary:

cd /opt/VRTSvcs/bin/Cache/
ln -s /opt/VRTSvcs/bin/Script51Agent CacheAgent

Copy the Caché agent script files to the /opt/VRTSvcs/bin/Cache directory:

cp <installdir>/dev/cache/HAcluster/VCS/Linux/monitor /opt/VRTSvcs/bin/Cache/
cp <installdir>/dev/cache/HAcluster/VCS/Linux/clean /opt/VRTSvcs/bin/Cache/
cp <installdir>/dev/cache/HAcluster/VCS/Linux/online /opt/VRTSvcs/bin/Cache/
cp <installdir>/dev/cache/HAcluster/VCS/Linux/offline /opt/VRTSvcs/bin/Cache/

Adjust the ownerships and permissions of the agent files:

chown root:root /opt/VRTSvcs/bin/Cache/offline
chown root:root /opt/VRTSvcs/bin/Cache/online
chown root:root /opt/VRTSvcs/bin/Cache/monitor
chown root:root /opt/VRTSvcs/bin/Cache/clean

chmod 750 /opt/VRTSvcs/bin/Cache/offline
chmod 750 /opt/VRTSvcs/bin/Cache/online
chmod 750 /opt/VRTSvcs/bin/Cache/monitor
chmod 750 /opt/VRTSvcs/bin/Cache/clean

Copy the Caché agent type definition to the VCS configuration directory and adjust ownerships and permissions:

cp <installdir>/dev/cache/HAcluster/VCS/Linux/CacheTypes.cf /etc/VRTSvcs/conf/config/
chmod 600 /etc/VRTSvcs/conf/config/CacheTypes.cf
chown root:root /etc/VRTSvcs/conf/config/CacheTypes.cf

Edit your main.cf file and add the following include line at the top of the file:
```
include "CacheTypes.cf"
```

You are now ready to install Caché in the cluster and configure VCS to control your Caché instance(s) using the Caché agent.

Installing Caché in the Cluster

After a service group has been created and configured, install Caché in the cluster using the procedures outlined below.

These instructions assume that the VCS scripts have been placed in /opt/VRTSvcs/ and the configuration information in /etc/VRTSvcs/ as described in Installing the VCS Caché Agent, earlier in this appendix.

There are different procedures depending on whether you are installing only one instance of Caché or multiple instances of Caché. Installing a single instance of Caché in the cluster is common in production clusters. In development and test clusters it is common to have multiple instances of Caché controlled by the cluster software. If it is possible that you will install multiple instances of Caché in the future, follow the procedure for multiple instances.

Note:

For information about upgrading Caché in an existing failover cluster, see Upgrading a Cluster in the “Upgrading Caché” chapter of the Caché Installation Guide.

Installing a Single Instance of Caché

Use the following procedure to install and configure a single instance of Caché in the VCS cluster.

Note:

If any Caché instance that is part of a failover cluster is to be added to a Caché mirror, you must use the procedure described in Installing Multiple Instances of Caché, rather than the procedure in this section.

Bring the service group online on one node. This should mount all required disks and allow for the proper installation of Caché.
1. Check the file and directory ownerships and permissions on all mount points and subdirectories.
2. Prepare to install Caché by reviewing the “Installing Caché on UNIX® and Linux” chapter of the Caché Installation Guide.
Create a link from /usr/local/etc/cachesys to the shared disk. This forces the Caché registry and all supporting files to be stored on the shared disk resource you have configured as part of the service group.
A good choice is to use a ./usr/local/etc/cachesys/ subdirectory under your installation directory.
For example, assuming Caché is to be installed in /cacheprod/cachesys/, specify the following:
```
mkdir –p /cacheprod/cachesys/usr/local/etc/cachesys
mkdir –p /usr/local/etc/
ln –s /cacheprod/cachesys/usr/local/etc/cachesys /usr/local/etc/cachesys
```
Run Caché cinstall on the node with the mounted disks. Be sure the users and groups (either default or custom) have already been created on all nodes in the cluster, and that they all have the same UIDs and GIDs.
Stop Caché and relocate the service group to the other node. Note that the service group does not yet control Caché.

On the second node in the cluster, create the link in /usr/local/etc/ and the links in /usr/bin for ccontrol and csession:

mkdir –p /usr/local/etc/
ln –s /cacheprod/cachesys/usr/local/etc/cachesys/ /usr/local/etc/cachesys/
ln –s /usr/local/etc/cachesys/ccontrol /usr/bin/ccontrol
ln –s /usr/local/etc/cachesys/csession /usr/bin/csession

Manually start Caché using ccontrol start. Test connectivity to the cluster through the virtual IP address (VIP). Be sure the application, all interfaces, any ECP clients, and so on connect to Caché using the VIP as configured here.
Be certain Caché is stopped on all nodes. Shut down VCS to prepare to add reference to the Caché agent. Make sure the agent is installed in /opt/VRTSvcs/bin/Cache/. Make sure the CacheTypes.cf configuration file is in /etc/VRTSvcs/conf/config/. Make sure ownerships and permissions match VCS requirements. See Installing the VCS Caché Agent section of this appendix for more information about these requirements.
Add the Caché agent configured to control your new instance to your cluster service group, as follows. This example assumes the instance being controlled is named CACHEPROD. See the Understanding the VCS Caché Agent Options section of this appendix for information about the Inst and CleanStop options.
```
Cache cacheprod (
      Inst = CACHEPROD
      CleanStop = 0
)
```
The Caché resource must be configured to require the disk resource and optionally the IP resource.
Start VCS and verify that Caché starts on the primary node.

Installing Multiple Instances of Caché

To install multiple instances of Caché, use the following procedure.

Note:

If any Caché instance that is part of a failover cluster is to be added to a Caché mirror, the ISCAgent (which is installed with Caché) must be properly configured; see Configuring the ISCAgent in the “Mirroring” chapter of this guide for more information.

Bring the service group online on one node. This should mount all required disks and allow for the proper installation of Caché.
1. Check the file and directory ownerships and permissions on all mount points and subdirectories.
2. Prepare to install Caché by reviewing the “Installing Caché on UNIX® and Linux” chapter of the Caché Installation Guide.
Run Caché cinstall on the node with the mounted disks. Be sure the users and groups (either default or custom) are already created on all nodes in the cluster, and that they all have the same UIDs and GIDs.
The /usr/local/etc/cachesys directory and all its files must be available to all nodes at all times. To enable this, copy /usr/local/etc/cachesys from the first node you install to each node in the cluster. The following method preserves symbolic links during the copy process:
```
cd /usr/local/
rsync -av -e ssh etc root@node2:/usr/local/
```
Verify that ownerships and permissions on the cachesys directory and its files are identical on all nodes
Note:

In the future, keep the Caché registries on all nodes in sync using ccontrol create or ccontrol update or by copying the directory again; for example:
```
ccontrol create CSHAD directory=/myshadow/ versionid=2013.1.475
```
Stop Caché and relocate the service to the other node. Note that the service group does not yet control Caché.

On the second node in the cluster, create the links in /usr/bin for ccontrol and csession, as follows

ln –s /usr/local/etc/cachesys/ccontrol /usr/bin/ccontrol
ln –s /usr/local/etc/cachesys/csession /usr/bin/csession

Manually start Caché using ccontrol start. Test connectivity to the cluster through the VIP. Be sure the application, all interfaces, any ECP clients, and so on connect to Caché using the VIP as configured here .
Be certain Caché is stopped on all nodes. Shut down VCS to prepare to add reference to the Caché agent. Make sure the agent is installed in /opt/VRTSvcs/bin/Cache/. Make sure the CacheTypes.cf configuration file is in /etc/VRTSvcs/conf/config/. Make sure ownerships and permissions match VCS requirements. See Installing the VCS Caché Agent section of this appendix for more information about these requirements.
Add the Caché agent configured to control your new instance to your cluster service group, as follows. This example assumes the instance being controlled is named CACHEPROD. See the Understanding the VCS Caché Agent Options section of this appendix for information about the Inst and CleanStop options.
```
Cache cacheprod (
      Inst = CACHEPROD
      CleanStop = 0
)
```
The Caché resource must be configured to require the disk resource and optionally the IP resource.
Start VCS and verify that Caché starts on the primary node.

Notes on adding the second instance of Caché to the cluster

When you are ready to install a second instance of Caché within the same cluster, follow these additional steps:

Configure VCS to add the disk and IP resources associated with the second instance of Caché.
Bring VCS online so the disks are mounted on one of the nodes.
Be sure the users and groups to be associated with the new instance are created and synchronized between nodes.
On the node with the mounted disk, run ccinstal following the procedures outlined in the "Installing Caché on UNIX® and Linux" chapter of the Caché Installation Guide.
Stop Caché.
Synchronize the Caché registry using the following steps:
1. On the install node run
```
ccontrol list
```
2. Record the instance name, version ID and installation directory of the instance you just installed.
3. On the other node, run the following command to create the registry entry, using the information you recorded from the recently installed instance:
```
ccontrol create <instance_name> versionid=<version_ID> directory=<instance_directory>
```
Add the Caché agent controller for this instance to your main.cf script.
The Caché resource must be configured to require the disk resource and optionally the IP resource.
Start the cluster service group and verify that Caché starts.

Understanding the VCS Caché Agent Options

The VCS Caché agent has two options that can be configured as part of the resource:

Inst

Set to the name of the instance being controlled by this resource (there is no default).

CleanStop

Set to 1 for a ccontrol stop or 0 for an immediate ccontrol force.

CleanStop determines the behavior of Caché when VCS attempts to offline the resource. When CleanStop is set to 1, Caché first uses ccontrol stop. When CleanStop is set to 0, Caché immediately uses ccontrol force. Consider the following consequences when deciding about this option:

ccontrol stop (CleanStop = 1)

Waits for processes to end cleanly, potentially delaying the stop, especially when some processes are unresponsive due to a hardware failure or fault. This setting can significantly lengthen time-to-recovery.

ccontrol force (CleanStop = 0)

Because it does not wait for processes to end, dramatically decreases time-to-recovery in most cases of failover due to hardware failures or fault. However, while ccontrol force fully protects the structurally integrity of the databases, it may result in transaction rollbacks at startup. This may lengthen the time required to restart Caché, especially if long transactions are involved.

If a controlled failover is to occur, such as during routine maintenance, follow these steps:

Notify and remove the user community’s connections, stop batch and background jobs.
Stop Caché from the command line using ccontrol stop <instance_name>.
Fail over the cluster service.

Even if CleanStop is set to 0, the ccontrol force command issued during the stop of the cluster service has no effect since Caché is already cleanly stopped, with all transactions rolled back by the command line ccontrol stop before processes are halted .

Application Considerations

Consider the following for your applications:

Ensure that all network ports required for interfaces, user connectivity and monitoring are open on all nodes in the cluster.
Connect all interfaces, web servers, ECP clients and users to the database using the VIP over the public network as configured in the main.cf file.
Ensure that application daemons, Ensemble productions, and so on are set to autostart so the application is fully available to users after unscheduled failovers.
Consider carefully any code that is part of %ZSTART or otherwise occurs as part of the Caché startup. To minimize recovery time do not place heavy cleanup or query code in the startup such that VCS would time out before the custom code completed.
Other applications or web servers and so on can also be configured in the cluster, but these examples assume only Caché is installed under cluster control. Contact the InterSystems Worldwide Response Center (WRC)Opens in a new tab to consult about customizing your cluster.

Testing and Maintenance

Upon first setting up the cluster, be sure to test that failover works as planned. This also applies any time changes are made to the operating system, its installed packages, the disk, the network, Caché, or your application.

In addition to the topics described in this section, you should contact the InterSystems Worldwide Response Center (WRC)Opens in a new tab for assistance when planning and configuring your Veritas Cluster Server resource to control Caché. The WRC can check for any updates to the Caché agent, as well as discussing failover and HA strategies.

Failure Testing

Typical full scale testing must go beyond a controlled service relocation. While service relocation testing is necessary to validate that the package configuration and the service scripts are all functioning properly, you should also test responses to simulated failures. Be sure to test failures such as:

Loss of public and private network connectivity to the active node
Loss of disk connectivity
Hard crash of active node

Testing should include a simulated or real application load. Testing with an application load builds confidence that the application will recover in the event of an actual failure.

If possible, test with a heavy disk write load; during heavy disk writes the database is at its most vulnerable. Caché handles all recovery automatically using its CACHE.WIJ and journal files but testing a crash during an active disk write ensures that all file system and disk devices are properly failing over.

Software and Firmware Updates

Keep software patches and firmware revisions up to date. Avoid known problems by adhering to a patch and update schedule.

Monitor Logs

Keep an eye on the VCS logs in /var/VRTSvcs/log/. The Caché agent logs time-stamped information to the “engine” log during cluster events. To troubleshoot any problems, search for the Caché agent error code 60022.

Use the Caché console log, the Caché Monitor and the Caché System Monitor to be alerted to problems with the database that may not be caught by the cluster software. (See the chapters “Monitoring Caché Using the Management Portal”, “Using the Caché Monitor” and “Using the Caché System Monitor” in the Caché Monitoring Guide for information about these tools.)