Skip to main content

Monitoring a Production

The Management Portal provides pages to enable you to monitor a single production more closely (in contrast to the previous chapter, which describes how to monitor all namespaces). This chapter describes how to use these pages. It discusses the following topics:

General Notes

For background information, see the chapter “Concepts.”

For information on starting and stopping productions, see “Managing Ensemble.” Note that for a live, live, deployed production, InterSystems recommends that you use the auto-start option, which is described in that book.

If a production is Suspended or Troubled, see “Correcting Production Problem States.”

Using the Production Monitor

The Production Monitor page displays real-time status information about the currently running production in a condensed, one-page format, with links for further details. To display this page in the Management Portal, select Ensemble > Monitor > Production Monitor.

You can use this page to monitor the general health of the production in the selected namespace. The following is a partial example of what this page displays:

generated description: production monitor

By default, this page is automatically updated at frequent intervals. In the left area, you can clear the Auto update check box to disable these updates.

The Production Monitor page displays real-time information provided by the Monitor Service. The Monitor Service is a business service that is implicitly included in every Ensemble production (not visible as part of its configuration). The Monitor Service continually monitors the activities of Ensemble items while a production is running, and records data about them at frequent intervals.

Input Connections

The Input Connections table (upper left) lists all incoming connections from external systems. Each entry indicates following:

  1. Business service status

  2. Business service connection status

  3. Business service name

  4. Number of messages processed since the production started

The statuses are indicated by the cell color. The item status and the connection status cells have the following meaning:

  • Business Service status (first cell).

    • Green • (dot) — Active and OK.

    • Yellow - (hyphen) — Currently inactive, but otherwise OK.

    • Red ! (exclamation mark) — Error.

    • Gray X (letter X) — Disabled.

  • Connection status (second cell). The connection status is meaningful for TCP, HTTP, FTP, and ODBC connections.

    • Green + (plus sign) — Connected.

    • Yellow o (letter o) — Listening.

    • Red x (letter x) — Disconnected.

    • Gray - (hyphen) — Not applicable, disabled, unscheduled, or not connecting.

If you hover over the name of the service, the hover text provides additional information. If you select on the name of the service, the left area is updated with details and also displays the following associated links:

  • Event Log — Click to view the Event Log entries for the selected configuration item. For information, see “Viewing the Event Log.”

  • Queue Contents — Click to view the production queues. For information, see “Monitoring Production Queues,” later in this chapter.

Output Connections

The Output Connections table (upper right) lists all outgoing connections to external systems. Each entry indicates following:

  1. Business operation status

  2. Business operation connection status

  3. Business operation name

  4. Number of messages processed since the production started

The statuses are indicated by the cell color. The item status and the connection status cells have the following meaning:

  • Business Operation status (first cell).

    • Green • (dot) — Active and OK.

    • Yellow - (hyphen) — Currently inactive, but otherwise OK.

    • Red ! (exclamation mark) — Error.

    • Gray X (letter X) — Disabled.

    • Gray • (dot) — Retry. The business operation connection failed and the operation is retrying the connection.

  • Connection status (second cell). The connection status is meaningful for TCP, HTTP, FTP, and ODBC connections.

    • Green + (plus sign) — Connected.

    • Yellow o (letter o) — Listening.

    • Red x (letter x) — Disconnected.

    • Gray - (hyphen) — Not applicable, disabled, unscheduled, or not connecting.

If you select the name of the operation, the left area is updated with details and the same links as for the Input Connections table.

Queues

The Queues table (lower left) lists the status of Ensemble internal message queues and how many messages are currently waiting in each queue.

This table uses the same icons and color-coding as the Input Connections table. If you click an item in this table, the left area is updated with details and the Queue Contents link.

Event Log

The Event Log (lower right) summarizes recent entries in the Event Log.

Each entry provides an icon and color to indicate the item’s status, as follows:

  • Red ! — Error.

  • Orange W — Warning.

  • Yellow A — Alert.

If you select an item in this table, the left area is updated to show details of that Event Log entry. It also displays the Event Log link, which you can use to see the entire Event Log.

Activity Graph

The activity graph shows the message activity for the production or for a selected incoming or outgoing connection. The graph can show the message activity over a time period ranging from the previous 7 days to the previous 5 minutes. The following displays the activity graph or history of the production monitor:

generated description: activity monitior

You can specify the following for the activity graph:

  • Component to monitor—when you start the Production Monitor, the Activity Graph shows the messages for all incoming and outgoing connections. If you select an incoming or an outgoing connect on the Production Monitor, the Activity Graph shows the activity for the selected component only. If you want to return to the activity of the entire production, select on the currently selected connection to deselect it.

  • Auto update—if this check box is selected, the Production Monitor regularly updates the Activity Graph.

  • Time period to display—select one of the following:

    • Last week—display the activity for the previous 7 days. The vertical axis specifies the number of messages per hour.

    • Last day—display the activity for the previous 24 hours. The vertical axis specifies the number of messages per 15-minute interval.

    • Last hour—display the activity for the previous 60 minutes. The vertical axis specifies the number of messages per minute.

    • Last 5 minutes—display the activity for the previous 5 minutes. The vertical axis specifies the number of messages per 15-second interval.

Custom Metrics

The bottom area of the page might display one or more tables of custom metrics added by your Ensemble developers. For example:

generated description: production monitor custom addition

See “Adding Business Metrics to the Production Monitor,” in Developing Ensemble Productions.

Monitoring Production Queues

The Queues page shows the current state of all the message queues being used by the running Ensemble production in the selected namespace.

To display this page in the Management Portal, select Ensemble > Monitor > Queues.

generated description: queues

The table on this page has one row for each queue. The columns in this table are as follows:

  • Name — The name of the configuration item that has the queue. It may be different from the host class name.

  • Count — How many messages are on the queue. This value is a snapshot and may change when you refresh the page.

  • Active — The number of active messages.

  • Creation Time — The date and time when the queue was first created.

To see the contents of any given queue, select the row for that queue. The active messages and queue contents for that queue are displayed. If you select an entry in the queue contents or active messages, information about the message is displayed.

You can refresh the list of queues and contents by clicking the refresh arrow. You can also specify the time period to automatically refresh the list of queues and the active messages and queue contents tables.

The Active Messages table is displayed when there are active messages in the selected queue. It has one row for each active message, which identifies the message and its state. If you select one or more messages by checking the check box, you can abort or select the selected messages.

In the Active Messages table, you can select a message row to view the details of the selected message. The details are displayed to the right in the Header, Body, Contents, and Trace tabs. These tabs are the same as in the Message Viewer page; see “Viewing, Searching, and Managing Messages,”

The Queue Contents table on this page is displayed if there are messages in the selected queue. It has one row for each message in the given queue. The columns in this table are as follows:

  • Index — This integer value starts at 1 for the first message placed on the queue after the production starts, and increments by 1 for each successive message. A message has the same Index value for the entire time it is on the queue. Index values are never reused.

  • Priority — The priority of the message. See “Message Priority” in the first chapter.

  • MessageId — The object identifier for the message.

In the Queue Content table, you can perform the following tasks:

  • Select a message row to view the details of the selected message. The details are displayed to the right in the Header, Body, Contents, and Trace tabs. These tabs are the same as in the Message Viewer page; see “Viewing, Searching, and Managing Messages,”

  • Select messages by checking the check box for the messages.

  • Click Abort to abandon any ongoing attempts to send one or more messages selected with the check box. Click OK to verify the operation.

  • Click Abort All to abandon ongoing attempts to send all the messages in the queue. You must then click OK to verify the operation.

  • Select a page number to view that page in the list. Selecting |< displays the first page, << displays the previous page, >> displays the next page, and >| displays the last page.

Diagnosing Problems with Queues

By looking at queues and jobs, you can often quickly spot a problem in the system.

When there is buildup on a queue, it usually means something needs to be repaired. Usually the most important information about queues is the destination, or “target,” of any message that has been too long on a queue. In general, when a queued message is not being sent, it is because it cannot get to its target. If you can find out what is causing a problem with the target, when you solve that problem, the queue buildup will generally disappear. For example:

  • For a business service or business operation, if a queue is suddenly longer, this generally means that there is a problem communicating with an external system. An external connection may be down, or there may be a peak-hour effect that is affecting throughput on your external connections.

  • For a business service or business operation, if a queue is consistently long, this generally means that there is a consistent delay in sending messages. You should probably examine the external connection to see if there a performance problem that you can solve. If that is not possible, you could increase the appropriate pool size (unless you need to ensure first-in-first-out processing).

    See “Pool Size and Actor Pool Size” in Configuring Ensemble Productions.

  • For a business process that uses a private pool, if a queue is consistently long, you could increase the appropriate pool size (unless you need to ensure first-in-first-out processing).

  • If the actor queue is suddenly longer, a business process may have experienced an error that has caused it to become “stuck” in some way.

  • If the actor queue is consistently long, the actor pool for the production may need to be larger.

  • If many queues have a consistently large buildup, there may be a general capacity issue on the host computer, the Ensemble production (in its role as a CSP application) may need more resources, or the underlying Caché installation may need to be tuned. For suggestions, see the Caché System Administration Guide and Using Caché Server Pages (CSP). In general, however, you will be able to keep queues moving with the simpler adjustments listed in this topic.

Monitoring Active Jobs

The Currently Active Jobs page shows the currently active jobs for the production in the selected namespace.

To display this page in the Management Portal, select Ensemble > Monitor > Jobs.

The table on this page has one row for each active job. The columns in this table are as follows:

  • Job — Internal numeric identifier of the job.

  • Configuration Name — Configuration name of the business host for which this job was started.

    Each time a business service, business process, or business operation needs to do work, it starts a system job in which to complete its tasks. This job comes either from a private pool of jobs belonging to the business service, business process, or business operation, or (in the case of a business operation) it may come from the public actor pool for the production. When the task is done, the job returns itself to the pool of jobs from which it came.

    A production might need to start and stop several different jobs to complete a single request. The details depend (in part) on whether requests are made synchronously or asynchronously. For example, if a job is required to wait, the job returns itself to its pool during the wait time to free up that resource.

  • Mode — Either Background or Foreground.

  • Status — Typically, this is running or dequeuing.

  • Detail — Any additional detail that is available for the job.

  • Active Message — ID of the message currently being processed, if any.

  • State — Typically, this is active.

Diagnosing Problems with Jobs

By looking at jobs and queues, you can often quickly spot a problem in the system.

Most jobs spend most of their time in a dequeuing state while they wait for messages. During shutdown they should become quiescent. If the job does not become quiescent during shutdown, that likely indicates a problem. If the job is constantly in a running state, that also indicates a problem, unless you expect the component to be doing a lot of processing (and it is actually completing this processing).

Jobs that are marked as dead are jobs that have been terminated for some reason and Ensemble has detected that the job is no longer present on the system. This is normally an indication of a serious problem and should not occur. Also, if Ensemble detects a dead job, it writes an error to the Event Log.

Using the Production Configuration Page

Ensemble provides another way to view a production, the Production Configuration page. To access this page, select Ensemble > Configure > Production.

This page displays the business hosts in the production, with useful color coding as in the following example:

generated description: production view list

This page displays a circular status indicator next to each business host. If you click Legend to see the meaning of this indicator, Ensemble displays the following:

generated description: production diagram legend

Note that the primary purpose of this page is for configuring productions as described in Configuring Ensemble Productions.

Correcting Production Problem States

If a production is Suspended or Troubled, read this section.

If the state of a production is Running, then a production has been started and is operating normally. This is an acceptable state.

If the state of a production is Stopped, it is not running and all of its queues are free of synchronous messages. This is also an acceptable state.

In some cases (usually during development), you might see the Update button on this page for a production that is Running. Click this, and Ensemble updates the production to resolve the discrepancy. For an explanation, see “The Update Button” in Configuring Ensemble Productions.

Recovering a Suspended Production

A production acquires the Suspended status when, at the end of the shutdown sequence, some queues still contain synchronous messages.

You can start the Suspended production again to permit these messages to be processed. However, if the underlying problem is not resolved, you might acquire more synchronous messages in the queue without processing the previous messages.

Therefore, if a live, deployed Ensemble production goes into a Suspended state, contact the InterSystems Worldwide Response CenterOpens in a new tab (WRC) for assistance.

If a production becomes Suspended during development, see “Correcting Production Problem States” in Developing Ensemble Productions. In this case, you can use a procedure that discards the messages.

Recovering a Troubled Production

A production acquires a status of Troubled if Ensemble is stopped but the production did not shut down properly. This can happen if you restarted Ensemble or rebooted the machine without first stopping the production. In this case, click the Recover button.

FeedbackOpens in a new tab