Monday, August 5, 2013

Lessons Learned from Migrating BEM 7.4 to BPPM 9.0 - Part 6: BMC event adapter mcxa

BMC Event Adapter (mcxa) is an adapter BMC provides to integrate SNMP traps into BPPM/BEM cells. It was developed in Perl and can be installed anywhere. Although most vendors nowadays can send SNMP traps when alerts are raised, we prefer to use OS scripts to integrate events from non-BMC monitoring tools into BPPM cells.

While OS scripts can be logged, buffered, and retried with seamless failover, SNMP traps usually cannot - meaning that a slightest network instability could result in trap loss. Because SNMP trap based event integration is less reliable and more difficult to troubleshoot, we only use it when the monitoring tool does not provide a way to execute OS scripts when alerts are raised. In addition, SNMP trap based event integration requires an adapter while OS script based event integration makes direct connection to a BPPM cell.

Out of 5 non-BMC monitoring tools we have, only one is integrated into a BPPM cell using BMC Event Adapter (mcxa) because it cannot execute an OS script when an alert is raised.

Very little has been changed for BMC Event Adapter (mcxa) from BEM 7.4 to BPPM 9.0. We first converted its MIB file to map file. Then we configured BMC Event Adapter (mcxa). We had to change the default parameter settings for PollInterval, ReadsPerEngine, SnmpRcvbuf to maximize the capacity of mcxa in order to accommodate the large volume of the incoming SNMP traps. We also had to double the default value for SnmpTrapLength parameter in order to accommodate the large size of the incoming SNMP traps.

To increase the reliability, we installed two instances of BMC Event Adapter (mcxa) with one instance on each cell server. From the non-BMC monitoring tool, we configured the SNMP traps to be sent to those two mcxa instances simultaneously. This dual-configuration helps to minimize the SNMP trap loss in case of network connection failure. It also helps to address the lack of out-of-box high-availability feature in BMC Event Adapter (mcxa).

For the cell knowledge base, we made a minor change in the auto-generated mcsnmptrapdmib.baroc file so that we could write one rule instead of 50+ rules for all 50+ OIDs. We also added a de-duplication rule to remove the duplicated SNMP traps from the 2nd mcxa instance.

For the rest of cell knowledge base, we followed our standard procedures to map, convert, filter, correlate, update, define actions, execute actions, send email, and create tickets. In a later post, I will go into more details of our standard procedures in our cell knowledge base that universally apply to events from all event sources.




No comments:

Post a Comment