Monday, December 9, 2013

BPPM Implementation Considerations - Part 4: Monitor the monitors

The purpose of BPPM is to monitor your IT infrastructure.  It is important that the monitors themselves are up and running all the time.

A good BPPM implementation not just monitors your IT infrastructure, it also monitors each and every BPPM component including BPPM server, BPPM agent, BPPM cell, PATROL agent, PATROL adapter service/process, SNMP adapter service/process, IIWS service/process, IBRSD service/process, ..., etc. The self-monitoring metrics include component status and connection status.

The events alerting that a BPPM component down or a BPPM connection down are mostly sent to its connected BPPM cell automatically.  Some of the self-monitoring events require quick activation. You need to identify those events as they have different event classes and message formats. And you need to notify the right people about those events.

Some components may have multiple ways to be monitored and you just need to pick up one way that works the best in your environment.  For example, when a PATROL agent lost its connection with PATROL Integration Service, you can see an event directly sent from PATROL agent, another event from PATROL LOG KM if you configured it to monitor IS connection down log entry, and yet a third event from PATROL Integration Service if you activated it in BPPM GUI.

You may need to reword the message of a self-monitoring event for better readability as some messages are not clear at all.  For example, by default, PATROL agent connection down event contains the following slots:

  cell='PatrolAgent@server1@172.118.2.12:3181';
  msg='Monitored Cell is no longer responding';

You may want to reword the message to look like this:

  msg='PatrolAgent@server1@172.118.2.12:3181 is no longer responding';

because it is the PATROL agent that is no longer responding, not the cell.

For the notification method, the most reliable way is local email fired from the cell that receives the self-monitoring events. Since your path to the ticketing system may be down when your BPPM components are experiencing problems, your back-end ticking system should not be the only way to send notification for your self-monitoring alerts.  It should be used in addition to your local email notification.

No comments:

Post a Comment