Monday, November 25, 2013

BPPM Implementation Considerations - Part 2: Keep the total cost of ownership in mind

When you build a house for yourself, you don't just consider the cost of building, you also consider the cost of maintaining the house and utility bills when you live there.  Similarly when you implement BPPM, in addition to implementation cost, you also need to keep the total cost of ownership in mind.

After talking to several BPPM customers, I noticed that they all have at least twice the size of the operations team comparing to the team at my clients just to keep BPPM operations going.  What is worse is that their operations team also need to have the implementation skill set to constantly patch up the implementation.

Before you even start implementation, consider the following aspects:

1) Scalability: When your environment grows with more servers, more applications, or more integration, will your architecture still work?  How easy would it be to split horizontally (based on processing steps) and vertically (based on incoming traffic)?

2) Upgrade: What can you do right now to make future upgrade easier?  You may want to consider having a name convention, saving configuration in a separate repository, and documenting everything consistently.

3) High Availability: High availability not only helps with business continuity, it also helps your team from constantly fighting fire. You have several options in high availability: Application level failover, OS based failover, active/active load balance, or duplication. Which option would best fit your needs for each BPPM component and how much would it cost?  For example, a native application level failover might be your best choice for BPPM cells if your business cannot afford to miss a server down alert.  But a simple duplication of PATROL 7 console is probably sufficient for you comparing to OS based failover which would cost nearly twice as much.

4) Implementation Repeatability: Do you keep an accurate implementation document so that installation and configuration of each BPPM component is repeatable? You need to implement everything on a test system first and carefully document everything as you go. Production deployment should be a straightforward 'follow the doc' process. It also gives you a perfect opportunity to update the implementation document for anything you have missed.

A common mistake I have seen is to start the implementation directly on a production system.  After several months of figuring things out, it finally went live with many junk files sitting under the implementation directory.  Then you realized that you actually needed a test system because you won't be able to make and test changes otherwise.  Now you don't know how to configure your test system to make it identical to your production system since you have lost track on what made the production system work and what did not.

5) Operations Standardization: Do you have a standard operations procedure document? For example, if a new server is added into your PeopleSoft Payroll application, do you have a document containing the steps for the operations team to add that server to PATROL, BPPM integration service, BPPM cell, BPPM server, BPPM GUI, and automated Remedy ticketing?


Monday, November 18, 2013

BPPM Implementation Considerations - Part 1: Meet your business requirements

Three years after BMC ProactiveNet Performance Management (BPPM) is released, now most BPPM customers reached a conclusion that BPPM implementation is more than just software installation. But what make a BPPM implementation a successful one? What do you need to consider before diving into installation details?

"BPPM Implementation Consideration" blog series will try to address several important considerations at requirement level and architecture level.  Implementing BPPM is a lot like building a house. Many considerations at requirement level and architecture level are like the foundation of the house.  They need to be determined at the very beginning.

The most important consideration in BPPM implementation is your business requirements. The management of your organization, your entire implementation team,  and other stakeholders should have a clear understanding on a list of business requirements that your BPPM implementation is expected to meet.  Then you will need to translate this list of business requirements into a list of technical requirements with a category assignment such as mandatory, strategic, cost-saver, and nice-to-have.

Only now you can map each technical requirement into a list of detailed BPPM features and prioritize the implementation of each feature.  This will become your project scope.  Based on your project scope, you can plan your project timeline and budget.  If you outsource your BPPM implementation to a consulting company, it is critical that you do your homework on your business requirements and technical requirements first. Then work closely with the architect (not just the project manager) of the consulting company to determine the project scope.

However many new BPPM customers I have talked to seem to do it backwards.  They came up with a budget first without knowing exactly what BPPM features to implement and how long the implementation will take.  Then they picked up a list of BPPM features to implement from product datasheet without knowing how each feature relates to their business bottom line.

As an example, here is the process taken at one of my past clients.  One of the top business requirements was to cut down the cost on Remedy Gateway licenses from multiple monitoring software vendors.  This was translated into a technical requirement like this: Alerts from multiple monitoring software must be integrated into one alert management tool to communicate with Remedy for ticket creation. This requirement was categorized as cost-saver.  This technical requirement was mapped into these BPPM features: Event to BPPM cell integration through API and SNMP traps, msend API installation, SNMP trap adapter high-availability implementation, custom BPPM cell MRL rules to process events from multiple vendors, IBRSD high-availability implementation, and event to ticket categorization in BPPM cell.  The return was a 6-figure annual license saving year after year with an investment of 5-figure consulting fee.  This ROI went straight to help business bottom line.

Monday, November 11, 2013

PATROL LOG KM Examples - Part 5: Parsing script output instead of log file

In the previous 4 posts, we have discussed various ways to parse a log file using BMC PATROL LOG KM. Did you know that you can also use LOG KM to parse the output of a script?

Normally when you write your own script to collect data, you would need to write a custom KM to parse the result and send out alerts.  Although LOG KM doesn't provide the flexibility offered by a custom KM, it saves tremendous amount of development and maintenance effort comparing to writing a custom KM. All features available to parse a log file work the same way when parsing the output of a script.

For example, if you want to check the availability of a website, you would want to write a script to ping the website periodically and get an alert when the website is unreachable.  If we use www.bmc.com in our example, your script would look like:

ping www.bmc.com

First save this script in a file C:\scripts\ping_bmc.bat.

In your LOG KM configuration screen, put C:\scripts\ping_bmc.bat as your log file name and 'PING_BMC' as the logical name for the instance.  Then select 'Script' as your file type.  The default file type is 'Text File'. Please see the screen shot included in 'PATROL LOG KM Examples - Part 2' post for the locations of these selections.

In the 'Default Settings for Search Criteria' section, you have two ways to send alerts to BPPM/BEM cell: 1) Use recovery action to send parsing result as discussed in 'PATROL LOG KM Examples - Part 1' post; or 2) Use 'Custom Event Message' and 'Custom Event Origin' as discussed in 'PATROL LOG KM Examples - Part 2' post.

For this particular example, I found that using option 2) would work better because I can simply put "Unable to reach www.bmc.com." in my 'Custom Event Message' instead of the raw output from the script. I can also put '%APPCLASS%.%FILENAME%.%LOGICALNAME%' as my 'Custom Event Origin'.

In your search criteria configuration screen, use '0% loss' as your search string and check the 'NOT' box next to it because we only want to be alerted when there is a packet loss.

When there is a packet loss, or when the script output states "Ping request could not find host www.bmc.com.", you will receive an event in BPPM/BEM cell as follows:

mc_object_class='LOGMON';
mc_object='C:\scripts\ping_bmc.bat';
mc_parameter='PING_BMC';
msg='Unable to reach www.bmc.com.'

Monday, November 4, 2013

PATROL LOG KM Examples - Part 4: A not so simple case of multiple-line search

Last week I discussed a simple case of multiple-line search in PATROL LOG KM to include additional lines after the line that matches your search string pattern. But what if the additional lines you want to include are before the line that matches the search string pattern?  We will need to use an advanced feature of PATROL LOG KM called 'Multiline Search'.

For example, if you want to capture the following two lines in your log file and send out an alert message like "User: root password will expire in 3 days."

root 21292 c Mon Oct 28 08:00:00 2013
! Your password will expire in 3 days.


Before activating multiline search feature, configure LOG KM normally as shown in 'PATROL LOG KM Examples - Part 1' post..  Let's set up a log instance called 'Test_Log'.  The threshold#1 for State Change Options would be set to "1", and state would be set as "ALARM". The search pattern in this example would be "! Your password will expire in 3 days".

Now we are going to activate multiline search for LOG KM.  From PATROL console, right click on <host> -> OS KM -> LOG -> Test_Log -> KM Commands -> Advanced Feature -> Multiline Search

In the pop-up box, enter : in Start Delimiter, and enter password in End Delimiter.  Regular expressions don't work here. This defines the start and the end of the block that LOG KM will capture. 
logkm3.png
Now we need to configure recovery action.  Let's create a file called LOGKM_RecoveryAction_multiline.cfg as follows:

PATROL_CONFIG
"/AS/EVENTSPRING/LOGMON/Test_LogPN0/LOGErrorLvl/arsAction" = { REPLACE = "6" },
"/AS/EVENTSPRING/LOGMON/Test_LogPN0/LOGErrorLvl/arsCmdType" = { REPLACE = "PSL"},
"/AS/EVENTSPRING/LOGMON/Test_LogPN0/LOGErrorLvl/arsCommand" = {REPLACE=     "/opt/bmc/LOGKM_RecoveryAction_multiline.psl" }

Then create /opt/bmc/LOGKM_RecoveryAction_multiline.psl as follows:

sleep(1);
match_str = get("/LOGMON/". __instance__."/LOGMatchString/value");
expire_line = grep("! Your password will expire in 3 days.", match_str, "n");
account_list = "";
foreach lin (expire_line) {
  line_num = nthargf(lin, 1, ":");
  account_line = nthlinef(match_str, line_num-1);
  account = nthargf(account_line, 1);
  account_list = account_list." ".account;
}
msg = "User:".account_list." password will expire in 3 days";

status = get("/LOGMON/".__instance__."/LOGErrorLvl/status");
origin = "LOGMON.".__instance__.".PasswordExpire";
event_trigger2(origin, "STD", "41", status, "4", msg);
set("/LOGMON/".__instance__."/LOGErrorLvl/value", 1);

Run 'pconfig LOGKM_RecoveryAction_multiline.cfg' to push the configuration and then restart PATROL agent.