In a previous post (Part 2), I mentioned that all BEM level emails, tickets, and actions need to take place on remote cells in order to meet no-downtime requirement for critical events in a hospital environment. This means that IBRSD for BEM-level events should not be located on BPPM server due to its up to 10 minutes down time during failover. We also wanted to offload as many components as possible from BPPM server to improve the performance. In our BEM 7.4 implementation, we had 2 instances of IBRSD installed on 2 of our cell servers to achieve active/active high availability and load balance. They handled all our ticket creation and update well. We decided to keep the same architecture in BPPM 9.0.
However IBRSD is available only as part of BPPM server installation package in BPPM 9.0 while we needed it as part of BPPM agent installation package. We contacted BMC support but was told that they were not able to help at all. They did enter an enhancement request so hopefully we can see it packaged with BPPM agent in the future releases. Meanwhile we had to come up with a different way to install IBRSD on our cell servers. We copied the entire installation directory of IBRSD from BPPM server, added a few environment variables, and configured a new IBRSD instance in the copied directory. Fortunately the instances on both cell servers worked well.
By now I talked about how we architected BPPM server, BPPM agents, BPPM cells, and IBRSD in our environment for high availability, scalability, and performance. We used Microsoft Windows Clusters for BPPM server and BPPM agents. We used native application clusters for BPPM cells. We installed BPPM agent and integration service on integration service node. We installed BPPM cells, BMC Event Adapter, BMC Event Log Adapter, and IBRSD on cell server. By keeping BPPM cells completely separated from BPPM agents, not only we eliminated down time for BPPM cell failover, we also minimized the down time for BPPM agent failover. In addition, it offers better BPPM cell data protection by having duplicated event repositories. As an added bonus, it cost less since we needed fewer Microsoft Windows Cluster licenses.
Our implementation is somewhat different from what BMC recommends. On various documents and best practice webinars, BMC recommends to co-locate BPPM agents and BPPM cells on the same server and use disk-level OS clusters to achieve high availability. Had we gone with that recommendation, we would experience not only longer down time for failover, but also increased risk that another cell may fail at the secondary node.
Here are the lessons learned so far: To realize the highest ROI on BMC Software investment, business requirements should drive technical design. It is important to evaluate all options through due diligence. Performing due diligence does require the support from the management of the organization and systematic approach to test and verify the proposed model. Sometimes we need to think out of box as shown in the IBRSD example.
BPPM (BMC ProactiveNet Performance Management) or TrueSight Operations Management (the rebranded name) suite is the latest solution from BMC Software for enterprise system management. It combines the data analytic engine from ProactiveNet, the event processing engine from BMC Event Manager (BEM), and the server/application monitor from PATROL into one product. This blog is intended to share information and experience on TrueSight/BPPM implementation, customization, and integration.
No comments:
Post a Comment