Tuesday, May 5, 2015

Understand BPPM As A Decision Maker - Part 6: Implementaion - static thresholds

In the previous post, we discussed static thresholds and dynamic thresholds in general.  Since there are many different variations of static thresholds, we are going to look into the details.

A static threshold can have three different scopes: global, local, and instance.

A static threshold with global scope applies to all servers and all instances in your environment.  For example, a global critical threshold with service status = 3 means that if the parameter status is equal to 3 for any service running on any server, a critical alert will be raised.

A static threshold with local scope applies to one particular server.  For example, a local critical threshold with free disk space percentage <15 means that if the parameter 'free disk space percentage' is below 15% for any disk running on this particular server, a critical alert will be raised.  A local threshold will always override the global threshold.  In this example, the global threshold could be free disk space percentage <10.  But because the applications running on this particular server tend to fill up disk space much faster than other servers, you may want to use a more conservative local threshold.
 
A static threshold with instance scope applies to one particular instance.  For example, a instance critical threshold with free disk space percentage <20 means that if the parameter 'free disk space percentage' is below 20% for one particular disk (e.g. C drive) running on any server, a critical alert will be raised.  A instance threshold will always override the global threshold.  In this example, the global threshold could be free disk space percentage <10.  But because C drive is usually smaller and more critical to keep the server up than other drives, you may want to use a more conservative instance threshold.
 
As we mentioned in the previous post, a static threshold can be configured at each PATROL agent or at BPPM server or at both places.  And BPPM does not relate the static thresholds configured at each PATROL agent with the ones at BPPM server.  If you decide to configure static thresholds at both PATROL agents and BPPM server, you need to manually keep tracking them so there won't be any gap or overlap. 

You may want to ask: Why not just configure all static thresholds in BPPM server?   There are two major limitations for this approach.

The first limitation is that each BPPM server can only store 1,700,000 attributes/parameters in its database.  If you have a large environment, you can only store a small subset of your parameters in BPPM server database.  In order to configure a static threshold for a parameter in BPPM server, this parameter must be stored in BPPM server database. 

The second limitation is that BPPM server still doesn't have an application-level quick fail-over architecture. If BPPM server becomes unavailable, no threshold can be applied and thus no alert can be raised until the OS-based secondary BPPM server is up - which usually takes 10 minutes or longer.

Some BMC customers with small environment in non-critical business did choose to configure all static thresholds in BPPM server.  So if that is doable in your environment, you can absolutely configure all static thresholds in BPPM server.

There is another aspect of static thresholds that you can set: duration - how long the threshold has to be violated before raising the alert.

If you set static thresholds at each PATROL agent, the duration is represented by the number of polling cycles.  To set your desired duration for a participial parameter, you must know the default polling cycle for that parameter and reset the polling cycle if the default one does not meet your needs.  The polling cycle for a parameter determines how often (in seconds) the parameter value will be collected.  The combination of polling cycle and the number of polling cycles determines the threshold duration in seconds.

If you set static thresholds at BPPM server, the duration is represented by the number of minutes thus polling cycle is not needed.

Finally if you set static thresholds at each PATROL agent, you can choose to use either pconfig method or CMA method.  In pconfig method, you use either PCM (PATROL Configuration Manager) or pconfig scripts.  In CMA method, you use CMA policies.  If you use both, the thresholds will be combined.  In case of conflict, the thresholds set by CMA method will override the thresholds set by pconfig method.

If you set dynamic thresholds at BPPM server, you can choose to use either BPPM operations console or CMA.  In BPPM operations console, you can use either options menu or tools menu.  In CMA, you can use global thresholds method or CMA policies.  If you use both BPPM operations console and CMA, the thresholds will be combined.  In case of conflict, the thresholds set by CMA will override the thresholds set by BPPM operations console. 

As a decision maker, you can tell by now that there are a lot more decisions to make after you decide on using static thresholds for some data.  You will need to decide if you need local or instance thresholds in addition to global thresholds.  You will need to decide where you want to set them - at each PATROL agent or at BPPM server.  You will need to decide threshold durations.  To set static thresholds in PATROL agents, you will need to decide which method to use - pconfig or CMA. To set dynamic thresholds in BPPM server, you will need to decide which method to use - BPPM operations console or CMA.

No comments:

Post a Comment