Monday, January 13, 2014

bii4P3 vs bii4P7 - Part 2: How to set up high availability for bii4P

In last post, I mentioned that neither bii4P3 nor bii4P7 has built-in high availability.  If system monitoring is a critical component of your overall IT infrastructure, without high availability, bii4P can become a single point of failure.

For example, one of my previous clients manages IT infrastructure for a group of hospitals. In their environment, every single component is required to set up as high availability. Missing a critical "application down" alert could mean the difference between life and death for a hospital.

There are two methods that you can set up your own high availability for bii4P whether you choose bii4P3 or bii4P7.  However as I explained in my last post, bii4P3 is a more stable solution.

The first method is a simple redundancy.  For each bii4P instance you configured, you configure another identical bii4P instance running on another server.  You run both instances at the same time.  In BPPM/BEM cell, you use de-dup rule to drop the duplicated events.  

The advantage of the first method is simplicity.  The only MRL rule you need to pay attention to is the de-dup rule for PATROL events.  If you read BMC's out-of-box de-dup rule for PATROL events closely, you will find that you need to make some modification to it. Otherwise some non-duplicated events could be de-duped incorrectly - you could miss a critical alert because of that.  The disadvantage is that you have to double the network traffic.

The second method is to write a MRL rule to coordinate between two instances of bii4P.  For each bii4P instance you configured, you still configure another identical bii4P instance running on another server.  But you only run one instance at a time.  When the first instance went down, there will be a 'bii4P down' event shown up in BPPM/BEM cell to activate the MRL rule.  The rule will start the second instance of bii4P and notify BPPM administrator.  

The advantage of the second method is that it works as an application-level failover without increasing network traffic.  The disadvantage is that you do need to have experience in MRL programming and OS scripting.  I found that having each instance of bii4P share the same server as the cell simplifies the OS scripting work.

No comments:

Post a Comment