Monday, June 16, 2014

BPPM 9.5 Quick Start - Part 11: High availability consideration

As the final post in 'BPPM 9.5 Quick Start' series, let's look at high availability.  Without a doubt, BPPM 9.5 has made some great progress in high availability by introducing integration service clusters. A BPPM 9.5 integration service node (ISN) can fail over to the other ISN in the cluster seamlessly with no data loss. PATROL agents connected to the ISN cluster can buffer and resend data for up to 30 minutes.

However no progress has been made in the high availability of BPPM server.  Your only option is still a disk-based cluster provided by the operating system such as Microsoft Windows Cluster.  There are a couple of drawbacks in a disk-based BPPM cluster: 1) Up to 10 minutes downtown after the primary server goes down and before the secondary server comes up; 2) High cost - the cost for two servers in the cluster plus the software is usually about the cost of three servers.

If your business cannot justify to implement a disk-based BPPM server cluster especially if you are also required to implement a DR BPPM server in another data center, you would wonder if there are something else you can do to improve the high availability for your overall solution without incurring the cost and complexity of a disk-based server cluster.

In a business, the most critical incidents that violate service level agreement are availability alerts.  Without high availability of BPPM server, you will need to use a pair of high-availability remote BPPM cells instead of the embedded cells on BPPM server to send notifications and initiate incident ticket creation.  Actually as long as those availability alerts do not come out of service models, a pair of H/A BPPM remote cells usually works better than the embedded cells in a disk-based BPPM server cluster because it is a native application cluster.

In order to initiate incident ticket creation on BPPM remote cells, you will need to install IBRSD on these cells.  This step is in addition to installing CMDB integration module on BPPM server.  CMDB integration module automatically includes IBRSD.  Now you can let BPPM server initiate service model related incident ticket creation and let BPPM remote cells initiate incident ticket creation not related to service models.

To address lack of GUI access when BPPM server goes down, you can install BEM version 7.x login server (also called admin server) and BMC Impact Explorer (IX) as an emergency GUI.  Register all your BPPM cells with the BEM login server.  When BPPM server goes down, you can still see all of your BPPM remote cells from IX.  All BPPM 9.x cells display well in BEM 7.x IX.

In BPPM 9.5, the integration service has been made totally stateless so that PATROL data travel through it to BPPM server without stopping.  If BPPM server goes down, PATROL data will be buffered at PATROL agent for up to 30 minutes.  This means no data loss as long as BPPM server is up again within 30 minutes.  Without data, you will lose intelligent events such as anomaly and trend prediction.  If they are not critical to your business for a short period of time, you can save some cost right now by postponing BPPM server H/A implementation until BMC comes up with an application level BPPM server H/A solution.

No comments:

Post a Comment