Monday, January 12, 2015

Total Cost of Ownership of BPPM - Part 14: Best practice - Capture PATROL agent up/down events

One of the important aspects in reducing total cost of ownership of BPPM is to be able to monitor its own components.  One of the most important components in BPPM framework is PATROL agent.  If PATROL agent goes down, you would want to be alerted immediately.

There are so many ways to monitor PATROL agents both in and out of BPPM framework.  Inside BPPM framework, when a PATROL agent goes down or starts up, an event is automatically generated.

The good news is that this is an out-of-box behavior.  You don't have to perform any extra configuration. The bad news is that the generated events are very different in each BPPM version.  To make it worse, in BPPM both the severity and the message require some modification before they can be understood in an alert.

Let's take a deep look what are those PATROL agent up/down events in each BPPM version, and how we should modify them.

If you are still using the old bii4P3 to forward your PATROL agent events to BEM/BPPM cell, regardless which version of the cell you use, you can directly send alert without any modification unless you want to add the port number to the message.  In addition, the default cell policy will close the corresponding agent down event upon receiving an agent up event.

Your PATROL agent down event with bii4P3:
MC_ADAPTER_CONTROL;
      severity=WARNING;
      mc_object='server1:3181';
      msg='Agent Connection -server1- down.';
END

Your PATROL agent up event with bii4P3:
MC_ADAPTER_CONTROL;
      severity=OK;
      mc_object='server1:3181';
      msg='Agent Connection -server1- open.';
END

If you are running BPPM 9.0 and sending PATROL events directly to BPPM cell, you will need to modify the severity and message.  In addition, there is no default cell rule or policy to close the corresponding agent down event upon receiving an agent up event. You will need to write your own.

Your PATROL agent down event in BPPM 9.0:
MC_CELL_HEARTBEAT_FAILURE;
      severity=WARNING;
      cell='PatrolAgent@server1@192.168.2.12:3181';
      msg='Monitored Cell is no longer responding';
END

You need to change the msg to 'PatrolAgent@server1@192.168.2.12:3181 is no longer responding' because it is actually the PATROL agent not responding, not the cell.

Your PATROL agent up event in BPPM 9.0:
MC_CELL_HEARTBEAT_ON;
      severity=INFO;
      cell='PatrolAgent@Chelsea@192.168.2.12:3181';
      msg='Monitored Cell is up again';
END

You need to change the msg to 'PatrolAgent@server1@192.168.2.12:3181 is up again' because it is actually the PATROL agent up again, not the cell.  In addition, you need to change severity to OK in order to write your own 'up event closing down event' rule in BPPM cell.

If you are running BPPM 9.5/9.6 and sending PATROL events either through integration service or directly to BPPM cell, you will need to modify the severity and message.  In addition, there is no default cell rule or policy to close the corresponding agent down event upon receiving an agent up event. You will need to write your own.

Your PATROL agent down event in BPPM 9.5/9.6:
PATROL_EV;
      severity=INFO;
      mc_origin='server1:3181';
      msg='Start/stop status of agent 'server1' is '0'. Restart flag (0)';
END

This looks worse than its previous versions. :-( You need to change the msg to 'PATROL agent on server1:3181 stopped'.  And you also need to change severity to CRITICAL or WARNING.

Your PATROL agent up event in BPPM 9.5/9/6:
PATROL_EV;
      severity=INFO;
      mc_origin='server1:3181';
      msg='Start/stop status of agent 'server1' is '1'. Restart flag (0)';
END

This looks worse than its previous versions. :-( You need to change the msg to 'PATROL agent on server1:3181 started'.  In addition, you need to change severity to OK in order to write your own 'up event closing down event' rule in BPPM cell.

No comments:

Post a Comment