Willa Ou's BMC TrueSight/BPPM Blog: May 2015

Monday, May 18, 2015

Understand BPPM As A Decision Maker - Part 8: Implementaion - combination of static and dynamic thresholds

We have gone through the details of static and dynamic thresholds in the last two posts. In addition to set static thresholds and dynamic thresholds separately, you can also combine them on a BPPM server to add more flexibility in your threshold settings.

The first option is to add a dynamic adjustment to a static threshold. In order to do that, you must set your static threshold at BPPM server not at each PATROL agent. In addition to severity, duration, and threshold value that included in a normal static threshold, you can also add a dynamic adjustment here by specifying if the threshold violation also has to be outside a baseline. You can select auto, hourly, daily, weekly, hourly+daily, and all baseline.

An example of the first option would be to set up a threshold for the number of login errors in the last data collection. If you want to set a static threshold as 3, you may want to add 'outside auto baseline' as a dynamic adjustment so that the alert won't be raised if the baseline during that time of day (such as 9am) is 4.

The second option is to add a static adjustment to a dynamic threshold. In addition to severity, duration, baseline, sampling window, absolute deviation, and percent deviation, you can also add a static adjustment here by specify if the threshold violation also has to violate a static threshold value.

An example of the second option would be to set up a threshold for CPU utilization. If you want to set a dynamic threshold as outside of auto baseline for 10 minutes with percent deviation as 15%, you may want to add a threshold value 50 as a static adjustment so that the alert won't be raised when the CPU utilization is 45% for 10 minutes even the baseline is 35%.

You may wonder what the difference is between the first option and the second option. When should you use static threshold with dynamic adjustment and when should you use dynamic threshold with static adjustment?

Dynamic threshold with static adjustment contains deviation in absolute value and in percent value. This feature is not available with static thresholds. Using deviation in a dynamic threshold gives you a cushion or buffer when comparing to a baseline. I personally find this feature very useful and I use deviation in most of my dynamic thresholds with or without static adjustments.

Static threshold with dynamic adjustment contains a 'predict' feature. This feature is not available with dynamic thresholds. Using 'predict' feature in a static threshold allows you to receive a predictive alert when an attribute with fixed-capacity is approaching its limit. This is very useful for attributes such as disk space utilization.

As a decision maker, you will need to determine if you need to combine static thresholds and dynamic thresholds to add more flexibility to your thresholds. If so, you will also need to decide which way to go: to add dynamic adjustment to a static threshold, or to add static adjustment to a dynamic threshold.

Tuesday, May 12, 2015

Understand BPPM As A Decision Maker - Part 7: Implementaion - dynamic thresholds

As mentioned previously, a dynamic threshold doesn't have an absolute value by itself. The threshold value is calculated on the fly based on historical data values from a specified time period (also called baseline). A dynamic threshold needs to contain the following details:

1) Duration: How long does the threshold need to be violated before an alert will be raised? By default, the duration is 0, meaning as soon as the threshold is violated an alert will be raised immediately.

2) Baseline: You can choose hously, daily, weekly, hourly & daily, and all baselines. The default is auto baseline, meaning that BPPM server will automatically choose the best baseline for you.

3) Sampling Window: How long does a parameter/attribute value must be collected before an alert can be raised? The default is 10 minutes or 5 data points, whichever is the longest.

4) Absolute Deviation: How much in absolute value does the parameter/attribute value must be above or below the threshold before an alert can be raised? The default is 1.

5) Percent Deviation: How much in percentage does the parameter/attribute value must be above or below the threshold before an alert can be raised? The default is 5%.

For example, you may want to set a dynamic threshold for your web transaction response time as follows: 1) Duration = 5 minutes; 2) Auto baseline; 3) Sampling window = 10 minutes; 4) Absolute Deviation = 1; 5) Percent Deviation = 40%. If it normally takes 5 seconds to complete a web transaction during the same time of the day, but now it takes 7 seconds (40% more than 5 seconds) consistently for the last 5 minutes, an alert will be raised.

As with a static threshold, a dynamic threshold can also have three different scopes: global, local, and instance.

Dynamic thresholds can only be set at BPPM server. You can choose to use either BPPM operations console or CMA to set a dynamic threshold. In BPPM operations console, you can use either options menu or tools menu. In CMA, you can use global thresholds method or CMA policies. If you use both BPPM operations console and CMA, the thresholds will be combined. In case of conflict, the thresholds set by CMA will override the thresholds set by BPPM operations console.

In order to set a dynamic threshold in BPPM server, the parameter/attribute value must be stored in BPPM server database, meaning that the data must be streamed. By default, all PATROL data are streamed to BPPM server database. But you may want to filter out some data in order not to exceed 1.7 millions of attributes capacity per BPPM server.

As a decision maker, you will need to come up with detailed specification (duration, baseline, sampling window, absolute and percentage deviation) after you decide on using dynamic thresholds for some data. You will need to decide if you need local or instance thresholds in addition to global thresholds. You will also need to decide which method to use - BPPM operations console or CMA.

Tuesday, May 5, 2015

Understand BPPM As A Decision Maker - Part 6: Implementaion - static thresholds

In the previous post, we discussed static thresholds and dynamic thresholds in general. Since there are many different variations of static thresholds, we are going to look into the details.

A static threshold can have three different scopes: global, local, and instance.

A static threshold with global scope applies to all servers and all instances in your environment. For example, a global critical threshold with service status = 3 means that if the parameter status is equal to 3 for any service running on any server, a critical alert will be raised.

A static threshold with local scope applies to one particular server. For example, a local critical threshold with free disk space percentage <15 means that if the parameter 'free disk space percentage' is below 15% for any disk running on this particular server, a critical alert will be raised. A local threshold will always override the global threshold. In this example, the global threshold could be free disk space percentage <10. But because the applications running on this particular server tend to fill up disk space much faster than other servers, you may want to use a more conservative local threshold.

A static threshold with instance scope applies to one particular instance. For example, a instance critical threshold with free disk space percentage <20 means that if the parameter 'free disk space percentage' is below 20% for one particular disk (e.g. C drive) running on any server, a critical alert will be raised. A instance threshold will always override the global threshold. In this example, the global threshold could be free disk space percentage <10. But because C drive is usually smaller and more critical to keep the server up than other drives, you may want to use a more conservative instance threshold.

As we mentioned in the previous post, a static threshold can be configured at each PATROL agent or at BPPM server or at both places. And BPPM does not relate the static thresholds configured at each PATROL agent with the ones at BPPM server. If you decide to configure static thresholds at both PATROL agents and BPPM server, you need to manually keep tracking them so there won't be any gap or overlap.

You may want to ask: Why not just configure all static thresholds in BPPM server? There are two major limitations for this approach.

The first limitation is that each BPPM server can only store 1,700,000 attributes/parameters in its database. If you have a large environment, you can only store a small subset of your parameters in BPPM server database. In order to configure a static threshold for a parameter in BPPM server, this parameter must be stored in BPPM server database.

The second limitation is that BPPM server still doesn't have an application-level quick fail-over architecture. If BPPM server becomes unavailable, no threshold can be applied and thus no alert can be raised until the OS-based secondary BPPM server is up - which usually takes 10 minutes or longer.

Some BMC customers with small environment in non-critical business did choose to configure all static thresholds in BPPM server. So if that is doable in your environment, you can absolutely configure all static thresholds in BPPM server.

There is another aspect of static thresholds that you can set: duration - how long the threshold has to be violated before raising the alert.

If you set static thresholds at each PATROL agent, the duration is represented by the number of polling cycles. To set your desired duration for a participial parameter, you must know the default polling cycle for that parameter and reset the polling cycle if the default one does not meet your needs. The polling cycle for a parameter determines how often (in seconds) the parameter value will be collected. The combination of polling cycle and the number of polling cycles determines the threshold duration in seconds.

If you set static thresholds at BPPM server, the duration is represented by the number of minutes thus polling cycle is not needed.

Finally if you set static thresholds at each PATROL agent, you can choose to use either pconfig method or CMA method. In pconfig method, you use either PCM (PATROL Configuration Manager) or pconfig scripts. In CMA method, you use CMA policies. If you use both, the thresholds will be combined. In case of conflict, the thresholds set by CMA method will override the thresholds set by pconfig method.

If you set dynamic thresholds at BPPM server, you can choose to use either BPPM operations console or CMA. In BPPM operations console, you can use either options menu or tools menu. In CMA, you can use global thresholds method or CMA policies. If you use both BPPM operations console and CMA, the thresholds will be combined. In case of conflict, the thresholds set by CMA will override the thresholds set by BPPM operations console.

As a decision maker, you can tell by now that there are a lot more decisions to make after you decide on using static thresholds for some data. You will need to decide if you need local or instance thresholds in addition to global thresholds. You will need to decide where you want to set them - at each PATROL agent or at BPPM server. You will need to decide threshold durations. To set static thresholds in PATROL agents, you will need to decide which method to use - pconfig or CMA. To set dynamic thresholds in BPPM server, you will need to decide which method to use - BPPM operations console or CMA.