Monday, September 22, 2014

PatrolCli - Part 5: Remote control PATROL agent recovery actions from BPPM cell

PATROL agent recovery action is a script triggered by a parameter state change.  It is often used to restart a process when it is down or dump diagnostic messages when a performance indicator shows an abnormal value.

PATROL agent recovery action is a powerful feature.  But the drawback is that the triggering parameter value can only be set by the local PATROL agent.  In today's complex enterprise IT, sometimes it requires data/events from multiple servers or multiple monitoring software to determine if a recovery action is necessary.  Sometimes it requires BPPM Analytics to determine that a performance indicator is out of normal range.

When multiple servers, multiple monitoring software, or BPPM Analytics are involved, BPPM cell is the only component with the capability to know that a PATROL agent recovery action is needed.

So how can BPPM cell communicate back to PATROL agent and trigger PATROL agent recovery action?

Since PATROL agent is normally installed on BPPM cell server, you can use PatrolCli to change the state of PATROL parameter remotely from BPPM cell server.  The state change of PATROL parameter will trigger the attached recovery action immediately.

For example, if you have a recovery action attached to parameter /NT_OS/NT_OS/_CollectionStatus, you can change its state to ALARM by including the following PatrolCli command in an OS script called trigger_recovery.cmd located on BPPM cell server:

execpsl "set(\"/NT_OS/NT_OS/_CollectionStatus/status\", ALARM);"

Then simply invoke trigger_recovery.cmd in execute() function of a MRL rule from BPPM cell when BPPM cell determines that it is time to trigger a PATROL recovery action.  This determination can be made by correlating multiple events sent from multiple servers by multiple monitoring software.  This determination can also be made by receiving an intelligent event generated from BPPM Analytics.

No comments:

Post a Comment