Monday, September 23, 2013

Monitoring PATROL Agent 9.x Status: Do I need AS_AVAILABILITY KM?

If you use PATROL to monitor your IT infrastructure, you would want to monitor the status of all your PATROL agents to make sure they are up and running. Although BMC recommends using PATROL AS_AVAILABILITY KM to monitor PATROL agent status, there is actually a much simpler way by using BPPM cell heartbeat events.

So the answer here is no. You don't need to use AS_AVAILABILITY KM to monitor PATROL agent 9.x status. As you are about to see, BPPM cell heartbeat events are totally automatic with built-in high availability and no PATROL agent configuration. AS_AVAILABILITY KM was developed before BMC acquired BPPM cell, it is still a great option if you have a 'PATROL only' environment without BPPM/BEM.

To use AS_AVAILABILITY KM, you need to configure the KM by selecting one PATROL agent as the 'pinger' and adding other PATROL agents as 'pingees'. Every time you deploy a new PATROL agent or decommission an existing PATROL agent, you would need to change AS_AVAILABILITY KM configuration. On the other hand, to use BPPM cell heartbeat events, you don't need to go through extra steps to register each PATROL agent with BPPM cell. As long as you set pconfig variable "/EventSetup/Configuration/EventCells" in your PATROL agent 9.x to send PATROL events to a BPPM cell, that BPPM cell will automatically monitor the status of the PATROL agent.

If the 'pinger' in your AS_AVAILABILITY KM goes down, you won't be able to monitor the status of other PATROL agents. To make AS_AVAILABILITY KM more robust, you would have to set up a second 'pinger' and complicated logic to coordinate between these two 'pingers' to avoid duplicated alerts. On the other hand, as long as your BPPM cell is set up as high availability, you don't need to go through extra steps to make BPPM cell heartbeat events as high availability.  Your PATROL agent status will always be monitored by the active H/A BPPM cell.

To best use BPPM cell heartbeat events, I recommend to reword the event message because the out-of-box message doesn't contain enough information. When a PATROL agent goes down, you would receive an event with out-of-box slots like this:
MC_CELL_HEARTBEAT_FAILURE;
  cell='PatrolAgent@server1@172.118.2.12:3181';
  msg='Monitored Cell is no longer responding';
  ...
END
You may want to reword the msg to 'PatrolAgent@server1@172.118.2.12:3181 is no longer responding'.  For its reciprocal MC_CELL_HEARTBEAT_ON event, you may want to reword its message in a similar way.


Monday, September 16, 2013

Parameter (Metrics) Thresholds: Do I still need to set them in PATROL?

On BPPM server, you can view all the data sent from each PATROL agent. You can set parameter (metrics) thresholds there including absolute thresholds such as 95% for file system utilization. Now you may wonder if you still need to set parameter thresholds in each PATROL agent.

In theory, if you send all PATROL data to BPPM server, it seems to be a good idea to have all parameter thresholds set on BPPM server only. Imagine how much time you can save for not having to set parameter thresholds in each PATROL agent and how much network bandwidth you can save for not having to send PATROL events to BPPM cells when those thresholds are violated.

In reality, the answer is yes. You still need to set parameter thresholds in each PATROL agent and let PATROL agent (not BPPM server) generate those events that violate absolute thresholds. You still need to send those PATROL events to BPPM cells. Let BPPM server generate intelligent events only and don't set absolute thresholds in BPPM server. The reason here is that not all PATROL data are in BPPM server.

First of all, PATROL agent does not buffer and resend data if it failed to send data to BPPM server for the first time. This situation can happen when there is a brief network outage such as a router is being rebooted. If it failed on the first try, data are lost forever. On the other hand, you may have already known that the connection between PATROL agent and BPPM cell is more robust as PATROL agent does buffer and resend events to BPPM cell with guaranteed delivery. BPPM server and BPPM cell were separately acquired by BMC Software from two different vendors. They use different communication protocols with different levels of robustness.

Second, PATROL agent only sends numerical data to BPPM server, not text data such as text parameter values and annotated data point values. Often those text data are needed as additional information for the events when numerical parameter thresholds are violated. For example, when using PATROL LOG KM, you may need to include information from a text parameter in the event to show the matched string. The only way to include information from a text parameter in events is to let PATROL agent (not BPPM server) generate the events. In addition, some PATROL KMs (e.g. LOG KM with custom events option, older version of Control-M KM, etc.) call event_trigger() to generate events without using parameter thresholds.

Last but not least, PATROL agent sends data to BPPM server every 5 minutes though PATROL agent may collect data more frequently. For example, CPU utilization is collected by PATROL agent every minute. This means that only every 5th value of CPU utilization is sent to BPPM server. Solely relying on absolute thresholds in BPPM server could result in delaying the alerts for 5 minutes or even missing the alerts altogether.

Monday, September 9, 2013

BMC Impact Integration for PATROL (bii4P): Is it no longer needed?

I have received this question from several people: "I have heard that BMC has eliminated bii4P in BPPM 9.0. Is it true? If it is true, why do I still see bii4P in some BPPM 9.0 architecture diagrams? And what can I use instead to send PATROL events to a BPPM cell configured as high availability?"

The simple answer is yes - it is true that bii4P has been eliminated. But bii4P elimination is related to PATROL agent only regardless of the version of BPPM server, BPPM agent, and BPPM cell. Starting from PATROL agent version 9.0, bii4P is no longer required for a PATROL agent to send its events to a cell. The cell can be a BEM 7.x cell, BPPM 8.x cell, or BPPM 9.x cell. If your PATROL agent version is older than 9.0, you still need bii4P even you are running BPPM cell 9.0. That is why you may still see bii4P in some BPPM 9.0 architecture diagrams.

To send PATROL events from a PATROL agent version 9.x to a BPPM/BEM cell configured as high availability, you need to have the following pconfig variables set: "/EventSetup/Configuration/EventCells", "/EventSetup/Configuration/Format", and "/EventSetup/Configuration/Key". For example:

"/EventSetup/Configuration/EventCells" = { REPLACE = "server1/1828,server2/1828" }, "/EventSetup/Configuration/Format" = { REPLACE = "BiiP3" },
"/EventSetup/Configuration/Key" = { REPLACE = "mc" }

*** where server1 is your primary cell server and server2 is your secondary cell server. If you have a standalone cell, you only need to specify server1/1828. ***

However, bii4P is still supported in PATROL agent 9.x. If you would like to send PATROL events to multiple cells (e.g. a production cell and a testing cell for troubleshooting purpose), bii4P is still the only option. In addition, both bii4P and PATROLAgent-to-cell direct connection can co-exist for the same PATROL agent.

In the new PATROLAgent-to-cell direct connection, PATROL agent initiates the connection with a cell and pushes events to the cell. PATROL agent does not have the capability to push events to two different cells at the same time.

bii4P is a standalone adapter. There are two versions of bii4P: bii4P3 and bii4P7. bii4P3 connects to PATROL agents directly while bii4P7 connects to PATROL agents through PATROL console server. bii4P3 is more commonly used nowadays due to its more stable connection with PATROL agents. bii4P initiates the connection with PATROL agents at one end to receive events and pushes the receives events to a cell at the other end.

To send PATROL events to two cells, you can configure two instances of bii4P, or you can configure PATROLAgent-to-cell direct connection for production cell and configure bii4P for testing cell.






Monday, September 2, 2013

Lessons Learned from Migrating BEM 7.4 to BPPM 9.0 - Part 10: Summary

This is the last post for "Lessons Learned from Migrating BEM 7.4 to BPPM 9.0" series. As a summary, here is the architecture diagram of BPPM 9.0 high availability implementation.

This architecture varies slightly from BMC's standard recommendation as we keep BPPM cells and BPPM Agents totally separated on different servers. In a real enterprise IT environment where data flow is steady but event flow is unpredictable, our architecture offers better resource utilization, more flexibility, and more robust high availability.

<This architecture diagram has been deleted>