Willa Ou's BMC TrueSight/BPPM Blog

BPPM (BMC ProactiveNet Performance Management) or TrueSight Operations Management (the rebranded name) suite is the latest solution from BMC Software for enterprise system management. It combines the data analytic engine from ProactiveNet, the event processing engine from BMC Event Manager (BEM), and the server/application monitor from PATROL into one product. This blog is intended to share information and experience on TrueSight/BPPM implementation, customization, and integration.

Monday, December 30, 2013

New Year's Resolutions

As we say good-by to 2013 and hello to 2014, it's time to reflect on 2013 and make resolutions for 2014.

2013 has been a significant year. We created our own version of BPPM architecture by modifying what BMC recommended and proved that it works better in the real enterprise IT environment. We standardized our BPPM/BEM extension software at our client site and now they are able to maintain, upgrade, and extend further on their own.

We started moving into the direction to make our business more scalable by developing software and training courses in addition to providing consulting services. We signed an agreement with our partner to jointly market our training courses once they become available.

I became much more involved at BMC online community. By sharing experience with other BMC users, I not only learned lots of technical details, I also gained tremendous insight on what our training courses should be focused on.

I started blogging on BPPM implementation in the summer. I want to take this opportunity to thank my readers for their generous feedback and continuous support. My blog has been recently included by www.itCentralStation.com, a new review site of enterprise IT products.

In 2014, I have so much to look forward to. We will continue providing consulting services to our clients with our rare expertise in customization and integration. We will complete the standardization of our BPPM extension software and increase our customer base. We will complete the development of our training courses to provide BMC customers a convenient and affordable option to learn practical skills on BPPM implementation based on our field experience instead of textbook theories.

I am so excited for 2014. I wish all of you a happy and prosperous new year.

Monday, December 23, 2013

Merry Christmas from World Opus Technologies

As the founder of World Opus Technologies, I would like to take this opportunity to thank you for your ongoing encouragement and generous support. I wish all of you a Merry Christmas and Happy New Year!

Here is a beautiful Christmas photo from Austin. Enjoy!

Monday, December 16, 2013

BPPM Implementation Considerations - Part 5: Customize at the right place

Unless you are a very small business, you will need to customize BMC out-of-box solutions to address the particular issues in your IT environment. It is unrealistic to expect a one-size-fits-all solution from BMC. Fortunately BPPM was developed with customization in mind. It provides extensive tools to help you develop your own solutions that seamlessly extend BMC out-of-box solutions.

BPPM suite has three major components: BMC ProactiveNet, BPPM Cell (BEM), and PATROL. Both BPPM Cell and PATROL are more than 10 years old. One of the primary reasons that they are still going strong today is because they both allow you to add your own solutions to them seamlessly.

Before you start developing your own custom solutions, take a step back to think about what options you have and where you should place your customization. What would be the impact on accessibility and resource consumption on the underline servers? What would be the impact on deployment of your custom solutions? What would be the impact on future maintenance and upgrade?

In PATROL, you can develop custom knowledge modules and you can also plug in your own PSL code as a recovery action into a parameter. In BPPM Cell, you can develop your own event classes, MRL code, dynamic tables, and action scripts to extend the out-of-box knowledge base.

In general, if you have a choice between customizing PATROL and customizing BPPM Cell to manage events, customizing BPPM Cell would require less effort and result in less impact to the servers that are being monitored. Here are a few reasons:

1) PATROL is running on the servers you don't own, have limited access, and may not be familiar with. For example, I was recently helping a client debug a custom KM running on AS400. I had to get help from AS400 sysadmin just to add one line in its PSL code.

2) PATROL is often sharing the server with mission critical applications. Poorly written PSL code could potentially impact the mission critical applications negatively.

3) The same custom knowledge module may need to be running on more than one server, thus requiring more time to deploy and upgrade.

4) BPPM Cell is running on your own infrastructure server. It is infinitely scalable as a peer-to-peer architecture. If resource has ever become an issue, you can add more cells either on the same server or on a different server (even with different operating system). you can split a cell horizontally by processing phases, or you can split a cell vertically by event sources.

Monday, December 9, 2013

BPPM Implementation Considerations - Part 4: Monitor the monitors

The purpose of BPPM is to monitor your IT infrastructure. It is important that the monitors themselves are up and running all the time.

A good BPPM implementation not just monitors your IT infrastructure, it also monitors each and every BPPM component including BPPM server, BPPM agent, BPPM cell, PATROL agent, PATROL adapter service/process, SNMP adapter service/process, IIWS service/process, IBRSD service/process, ..., etc. The self-monitoring metrics include component status and connection status.

The events alerting that a BPPM component down or a BPPM connection down are mostly sent to its connected BPPM cell automatically. Some of the self-monitoring events require quick activation. You need to identify those events as they have different event classes and message formats. And you need to notify the right people about those events.

Some components may have multiple ways to be monitored and you just need to pick up one way that works the best in your environment. For example, when a PATROL agent lost its connection with PATROL Integration Service, you can see an event directly sent from PATROL agent, another event from PATROL LOG KM if you configured it to monitor IS connection down log entry, and yet a third event from PATROL Integration Service if you activated it in BPPM GUI.

You may need to reword the message of a self-monitoring event for better readability as some messages are not clear at all. For example, by default, PATROL agent connection down event contains the following slots:

cell='PatrolAgent@server1@172.118.2.12:3181';
msg='Monitored Cell is no longer responding';

You may want to reword the message to look like this:

msg='PatrolAgent@server1@172.118.2.12:3181 is no longer responding';

because it is the PATROL agent that is no longer responding, not the cell.

For the notification method, the most reliable way is local email fired from the cell that receives the self-monitoring events. Since your path to the ticketing system may be down when your BPPM components are experiencing problems, your back-end ticking system should not be the only way to send notification for your self-monitoring alerts. It should be used in addition to your local email notification.

Monday, December 2, 2013

BPPM Implementation Considerations - Part 3: Achieve the highest ROI through integration

In addition to monitoring solutions from BMC, most enterprises nowadays also use monitoring software from other vendors, open source, and even home-grown scripts scheduled by cron job. Having a group of NOC operators watching the GUIs of all monitoring software in a NASA-like environment is simply not efficient. What is worse is when you have to pay the license fee for each monitoring software to connect with the back-end ticketing system.

BPPM/BEM cell provides extremely flexible and robust API and adapters to integrate with just about any monitoring software out there. Whether you are running monitoring tools from other commercial vendors such as IBM and Microsoft, or you use open source tools like Nagios, it is fairly straight forward to integrate alerts from these tools into BPPM/BEM cell using either its OS API or SNMP adapter. If you use home-grown scripts, all you need to do is to add an API call at the end.

If your back-end ticketing system is Remedy, the out-of-box 2-way integration (IBRSD) between BPPM/BEM cell and Remedy is more efficient than Remedy gateways for other monitoring tools. It is fairly straight forward to configure two instances of IBRSD as active/active failover, so your chance of waking up at 3am to fight fire is very slim. Since the license of IBRSD is included in the price of BPPM/BEM, you instantly cut down the cost when you stop paying for the Remedy gateway license for other monitoring tools.

Other added benefits include reduced maintenance effort for other monitoring software, less customization in Remedy, consistent ticket information for all monitoring tools, and possible event correlation between events from different monitoring tools. You will also make your NOC team's job easier.

I understand that it is not always easy to convince people who work on other monitoring software to integrate into BPPM/BEM due to organizational silo and technical complexity. It is important to pick up the right candidate for the first BPPM/BEM integration. Once the ROI is obvious, people will become more supportive for BPPM/BEM integration. In addition, it is also important to set up a consistent framework for all integration since BMC does not provide a standard for integration. Once you have set up a consistent framework for one-way and two-way integration, your next integration will become much easier.

At one of my past clients, it took our BPPM/BEM team three months to work with the other team to finish our first integration because the integration project had the lowest priority with the other team. Once everyone saw how well the integration worked and how much license fee it saved, our second integration took only 4 weeks to finish. Subsequently our third integration took only three days to finish.

Monday, November 25, 2013

BPPM Implementation Considerations - Part 2: Keep the total cost of ownership in mind

When you build a house for yourself, you don't just consider the cost of building, you also consider the cost of maintaining the house and utility bills when you live there. Similarly when you implement BPPM, in addition to implementation cost, you also need to keep the total cost of ownership in mind.

After talking to several BPPM customers, I noticed that they all have at least twice the size of the operations team comparing to the team at my clients just to keep BPPM operations going. What is worse is that their operations team also need to have the implementation skill set to constantly patch up the implementation.

Before you even start implementation, consider the following aspects:

1) Scalability: When your environment grows with more servers, more applications, or more integration, will your architecture still work? How easy would it be to split horizontally (based on processing steps) and vertically (based on incoming traffic)?

2) Upgrade: What can you do right now to make future upgrade easier? You may want to consider having a name convention, saving configuration in a separate repository, and documenting everything consistently.

3) High Availability: High availability not only helps with business continuity, it also helps your team from constantly fighting fire. You have several options in high availability: Application level failover, OS based failover, active/active load balance, or duplication. Which option would best fit your needs for each BPPM component and how much would it cost? For example, a native application level failover might be your best choice for BPPM cells if your business cannot afford to miss a server down alert. But a simple duplication of PATROL 7 console is probably sufficient for you comparing to OS based failover which would cost nearly twice as much.

4) Implementation Repeatability: Do you keep an accurate implementation document so that installation and configuration of each BPPM component is repeatable? You need to implement everything on a test system first and carefully document everything as you go. Production deployment should be a straightforward 'follow the doc' process. It also gives you a perfect opportunity to update the implementation document for anything you have missed.

A common mistake I have seen is to start the implementation directly on a production system. After several months of figuring things out, it finally went live with many junk files sitting under the implementation directory. Then you realized that you actually needed a test system because you won't be able to make and test changes otherwise. Now you don't know how to configure your test system to make it identical to your production system since you have lost track on what made the production system work and what did not.

5) Operations Standardization: Do you have a standard operations procedure document? For example, if a new server is added into your PeopleSoft Payroll application, do you have a document containing the steps for the operations team to add that server to PATROL, BPPM integration service, BPPM cell, BPPM server, BPPM GUI, and automated Remedy ticketing?

Monday, November 18, 2013

BPPM Implementation Considerations - Part 1: Meet your business requirements

Three years after BMC ProactiveNet Performance Management (BPPM) is released, now most BPPM customers reached a conclusion that BPPM implementation is more than just software installation. But what make a BPPM implementation a successful one? What do you need to consider before diving into installation details?

"BPPM Implementation Consideration" blog series will try to address several important considerations at requirement level and architecture level. Implementing BPPM is a lot like building a house. Many considerations at requirement level and architecture level are like the foundation of the house. They need to be determined at the very beginning.

The most important consideration in BPPM implementation is your business requirements. The management of your organization, your entire implementation team, and other stakeholders should have a clear understanding on a list of business requirements that your BPPM implementation is expected to meet. Then you will need to translate this list of business requirements into a list of technical requirements with a category assignment such as mandatory, strategic, cost-saver, and nice-to-have.

Only now you can map each technical requirement into a list of detailed BPPM features and prioritize the implementation of each feature. This will become your project scope. Based on your project scope, you can plan your project timeline and budget. If you outsource your BPPM implementation to a consulting company, it is critical that you do your homework on your business requirements and technical requirements first. Then work closely with the architect (not just the project manager) of the consulting company to determine the project scope.

However many new BPPM customers I have talked to seem to do it backwards. They came up with a budget first without knowing exactly what BPPM features to implement and how long the implementation will take. Then they picked up a list of BPPM features to implement from product datasheet without knowing how each feature relates to their business bottom line.

As an example, here is the process taken at one of my past clients. One of the top business requirements was to cut down the cost on Remedy Gateway licenses from multiple monitoring software vendors. This was translated into a technical requirement like this: Alerts from multiple monitoring software must be integrated into one alert management tool to communicate with Remedy for ticket creation. This requirement was categorized as cost-saver. This technical requirement was mapped into these BPPM features: Event to BPPM cell integration through API and SNMP traps, msend API installation, SNMP trap adapter high-availability implementation, custom BPPM cell MRL rules to process events from multiple vendors, IBRSD high-availability implementation, and event to ticket categorization in BPPM cell. The return was a 6-figure annual license saving year after year with an investment of 5-figure consulting fee. This ROI went straight to help business bottom line.

Monday, November 11, 2013

PATROL LOG KM Examples - Part 5: Parsing script output instead of log file

In the previous 4 posts, we have discussed various ways to parse a log file using BMC PATROL LOG KM. Did you know that you can also use LOG KM to parse the output of a script?

Normally when you write your own script to collect data, you would need to write a custom KM to parse the result and send out alerts. Although LOG KM doesn't provide the flexibility offered by a custom KM, it saves tremendous amount of development and maintenance effort comparing to writing a custom KM. All features available to parse a log file work the same way when parsing the output of a script.

For example, if you want to check the availability of a website, you would want to write a script to ping the website periodically and get an alert when the website is unreachable. If we use www.bmc.com in our example, your script would look like:

ping www.bmc.com

First save this script in a file C:\scripts\ping_bmc.bat.

In your LOG KM configuration screen, put C:\scripts\ping_bmc.bat as your log file name and 'PING_BMC' as the logical name for the instance. Then select 'Script' as your file type. The default file type is 'Text File'. Please see the screen shot included in 'PATROL LOG KM Examples - Part 2' post for the locations of these selections.

In the 'Default Settings for Search Criteria' section, you have two ways to send alerts to BPPM/BEM cell: 1) Use recovery action to send parsing result as discussed in 'PATROL LOG KM Examples - Part 1' post; or 2) Use 'Custom Event Message' and 'Custom Event Origin' as discussed in 'PATROL LOG KM Examples - Part 2' post.

For this particular example, I found that using option 2) would work better because I can simply put "Unable to reach www.bmc.com." in my 'Custom Event Message' instead of the raw output from the script. I can also put '%APPCLASS%.%FILENAME%.%LOGICALNAME%' as my 'Custom Event Origin'.

In your search criteria configuration screen, use '0% loss' as your search string and check the 'NOT' box next to it because we only want to be alerted when there is a packet loss.

When there is a packet loss, or when the script output states "Ping request could not find host www.bmc.com.", you will receive an event in BPPM/BEM cell as follows:

mc_object_class='LOGMON';

mc_object='C:\scripts\ping_bmc.bat';

mc_parameter='PING_BMC';

msg='Unable to reach www.bmc.com.'

Monday, November 4, 2013

PATROL LOG KM Examples - Part 4: A not so simple case of multiple-line search

Last week I discussed a simple case of multiple-line search in PATROL LOG KM to include additional lines after the line that matches your search string pattern. But what if the additional lines you want to include are before the line that matches the search string pattern? We will need to use an advanced feature of PATROL LOG KM called 'Multiline Search'.

For example, if you want to capture the following two lines in your log file and send out an alert message like "User: root password will expire in 3 days."

root 21292 c Mon Oct 28 08:00:00 2013
! Your password will expire in 3 days.

Before activating multiline search feature, configure LOG KM normally as shown in 'PATROL LOG KM Examples - Part 1' post.. Let's set up a log instance called 'Test_Log'. The threshold#1 for State Change Options would be set to "1", and state would be set as "ALARM". The search pattern in this example would be "! Your password will expire in 3 days".

Now we are going to activate multiline search for LOG KM. From PATROL console, right click on <host> -> OS KM -> LOG -> Test_Log -> KM Commands -> Advanced Feature -> Multiline Search

In the pop-up box, enter : in Start Delimiter, and enter password in End Delimiter. Regular expressions don't work here. This defines the start and the end of the block that LOG KM will capture.

Now we need to configure recovery action. Let's create a file called LOGKM_RecoveryAction_multiline.cfg as follows:

PATROL_CONFIG

"/AS/EVENTSPRING/LOGMON/Test_LogPN0/LOGErrorLvl/arsAction" = { REPLACE = "6" },

"/AS/EVENTSPRING/LOGMON/Test_LogPN0/LOGErrorLvl/arsCmdType" = { REPLACE = "PSL"},

"/AS/EVENTSPRING/LOGMON/Test_LogPN0/LOGErrorLvl/arsCommand" = {REPLACE= "/opt/bmc/LOGKM_RecoveryAction_multiline.psl" }

Then create /opt/bmc/LOGKM_RecoveryAction_multiline.psl as follows:

sleep(1);

match_str = get("/LOGMON/". __instance__."/LOGMatchString/value");

expire_line = grep("! Your password will expire in 3 days.", match_str, "n");

account_list = "";

foreach lin (expire_line) {

line_num = nthargf(lin, 1, ":");

account_line = nthlinef(match_str, line_num-1);

account = nthargf(account_line, 1);

account_list = account_list." ".account;

}

msg = "User:".account_list." password will expire in 3 days";

status = get("/LOGMON/".__instance__."/LOGErrorLvl/status");

origin = "LOGMON.".__instance__.".PasswordExpire";

event_trigger2(origin, "STD", "41", status, "4", msg);

set("/LOGMON/".__instance__."/LOGErrorLvl/value", 1);

Run 'pconfig LOGKM_RecoveryAction_multiline.cfg' to push the configuration and then restart PATROL agent.

Monday, October 28, 2013

PATROL LOG KM Examples - Part 3: A simple case of multiple-line search

In the last two PATROL LOG KM posts, I have discussed two different ways to send out alerts. In those alerts, each matched log entry contains one single line from the log file. What if you want each matched log entry to contain more than one line? This happens when some critical information is actually contained in the lines before or after the line that matches the search string pattern. Including those additional lines in your alert emails or trouble tickets would definitely help to speed up the troubleshooting process.

If the additional lines you want to include are after the line that matches the search string pattern, the solution is simple. For example, if you would like to have the following two lines included in your matched log entry:

031605: Error: Disc Full
/hd001 mounted as /opt

You can use 'Disc Full' as your search string pattern. To make the matched log entry contain one additional line after the line that matched the search string pattern, you simply put '2' in 'Number of Lines in Log Entry' field in LOG KM instance configuration screen. (Please see the location of this field from the LOG KM instance configuration screen displayed in 'PATROL LOG KM Examples - Part 2' post.) And you can configure the rest of LOG KM as usual. You can send one alert per polling cycle as described in 'PATROL LOG KM Examples - Part 1' post or send one alert per matched log entry as described in 'PATROL LOG KM Examples - Part 2' post.

However, if the additional lines you want to include are before the line that matches the search string pattern, the solution is not so simple. For example, if you would like to have the following two lines included in your matched log entry:

root 21292 c Mon Oct 28 08:00:00 2013
! Your password will expire in 3 days.

Here you would need to use some strings from the second line as your search string pattern because nothing from the first line is unique enough as a search pattern. Then how can we include information from the line before the line that matches the search string pattern? In the next post, I will discuss a solution to this example by using an advanced feature of PATROL LOG KM called 'Multiline Search'. Stay tuned.

Monday, October 21, 2013

PATROL LOG KM Examples - Part 2: Sending one alert per matched log entry

In the last post "PATROL LOG KM Examples - Part 1", I discussed how to configure PATROL LOG KM to send one alert per polling cycle regardless how many matched log entries were found in the polling cycle. But what if some of the matched log entries are database related alerts and need to be emailed and ticketed against database group, and some of the matched log entries are operating system related alerts and need to be emailed and ticketed against UNIX sysadmin group?

There is another way to configure PATROL LOG KM to send one alert per matched log entry. This option is lesser known, but it is more flexible than sending one alert per polling cycle because you can specify alert severity separately for each string pattern. For example, you can specify severity ALARM for each log entry that matches string pattern "fatal", and specify severity WARNING for each log entry that matches string pattern "retry".

To send one alert per matched log entry, you need to configure "Default Settings for Search Criteria" section as shown in the following example:

"Custom Event Origin" should contain three strings separated by '.'. The first string before '.' (%APPCLASS% in the above example) will go to mc_object_class slot in your event. The 2nd string between two '.'s (%FILENAME% in the above example) will go to mc_object slot in your event. The 3rd string after the '.' (%LOGICALNAME% in the above exmaple) will go to mc_parameter slot in your event. In the above example, you will get an event with

mc_object_class='LOGMON';

mc_object='C:\BMC\Patrol3\log\PatrolAgent-Sophie-3181.errs';

mc_parameter='PATROL_AGENT_LOG';

"Custom Event Message" should contain anything you want to show in msg slot of your event. In the above example. I put "%SEARCHID%:%1-". If you specify your search ID as "FATAL" for your string pattern "fatal", and the log entry that matches "fatal" string pattern is "Fatal error. Application exit.", the msg slot in your event will be:

msg='FATAL:Fatal error. Application exit.';

This is the only screen your need to configure to let PATROL LOG KM send one alert per matched log entry. Unlike the previous post, you don't need to do anything in pconfig or coding in PSL at all.

Monday, October 14, 2013

PATROL LOG KM Examples - Part 1: Sending one alert per polling cycle

Sorry for not posting for two weeks as I was out of country where I was not able to access this blog.

PATROL LOG KM is one of the most commonly used KMs. By design, each LOG KM instance monitors one log file. Two important parameters in LOG KMs are LOGErrorLvl and LOGMatchString. When a string pattern is found in the log file, LOGErrorLvl will go to alarm and the matched log entry will be saved in LOGMatchString. Since you can configure LOG KM to search for multiple string patterns in each log file, all matched log entries are saved together in one LOGMatchString parameter.

If you would like to send all matched log entries as one alert, you can use recovery action to generate an event and send to BPPM/BEM cell. I have seen many examples that use variable '__udefvar__' in pconfig rules. But '__udefvar__' only works with PATROL Notification Server. If you don't use PATROL Notification Server as it is optional to use it prior to PATROL agent 9.x and there is no need to use it with PATROL agent 9.x, you can use event_trigger2() PSL call instead. Here is an example pconfig rule set and PSL code.

Pconfig rule:

"/AS/EVENTSPRING/LOGMON/__ANYINST__/LOGErrorLvl/arsAction" = { REPLACE = "6" },
"/AS/EVENTSPRING/LOGMON/__ANYINST__/LOGErrorLvl/arsCmdType" = { REPLACE = "PSL"},
"/AS/EVENTSPRING/LOGMON/__ANYINST__/LOGErrorLvl/arsCommand" = { REPLACE = "C:\\BMC\\Patrol3\\lib\\psl\\LOGKM_RecoveryAction.psl" }

LOGKM_RecoveryAction.psl code:

sleep(1);

message = get("/LOGMON/". __instance__."/LOGMatchString/value");

inst= get("/LOGMON/". __instance__."/name");

event_trigger2(inst."/LOGMatchString","STD", "41", ALARM, 4, message);

set("/LOGMON/".__instance__."/LOGErrorLvl/value", 1);

A few things to notice here:

1. You can embed the entire PSL script into "/AS/EVENTSPRING/LOGMON/__ANYINST__/LOGErrorLvl/arsCommand" pconfig variable with some '\' to escape newlines and "()" symbols, etc.

2. The sleep statement in the first line would give PATROL agent enough time to finish writing a big block of data into LOGMatchString.

3. The set statement in the last line sets parameter LOGErrorLvl back to OK state immediately after the recovery action. Recovery action is triggered by state change. When another string pattern is found again in the next polling cycle, if the state of parameter LOGErrorLvl remains in ALARM state without going back to OK in between, the recovery action won't be triggered.

Monday, September 23, 2013

Monitoring PATROL Agent 9.x Status: Do I need AS_AVAILABILITY KM?

If you use PATROL to monitor your IT infrastructure, you would want to monitor the status of all your PATROL agents to make sure they are up and running. Although BMC recommends using PATROL AS_AVAILABILITY KM to monitor PATROL agent status, there is actually a much simpler way by using BPPM cell heartbeat events.

So the answer here is no. You don't need to use AS_AVAILABILITY KM to monitor PATROL agent 9.x status. As you are about to see, BPPM cell heartbeat events are totally automatic with built-in high availability and no PATROL agent configuration. AS_AVAILABILITY KM was developed before BMC acquired BPPM cell, it is still a great option if you have a 'PATROL only' environment without BPPM/BEM.

To use AS_AVAILABILITY KM, you need to configure the KM by selecting one PATROL agent as the 'pinger' and adding other PATROL agents as 'pingees'. Every time you deploy a new PATROL agent or decommission an existing PATROL agent, you would need to change AS_AVAILABILITY KM configuration. On the other hand, to use BPPM cell heartbeat events, you don't need to go through extra steps to register each PATROL agent with BPPM cell. As long as you set pconfig variable "/EventSetup/Configuration/EventCells" in your PATROL agent 9.x to send PATROL events to a BPPM cell, that BPPM cell will automatically monitor the status of the PATROL agent.

If the 'pinger' in your AS_AVAILABILITY KM goes down, you won't be able to monitor the status of other PATROL agents. To make AS_AVAILABILITY KM more robust, you would have to set up a second 'pinger' and complicated logic to coordinate between these two 'pingers' to avoid duplicated alerts. On the other hand, as long as your BPPM cell is set up as high availability, you don't need to go through extra steps to make BPPM cell heartbeat events as high availability. Your PATROL agent status will always be monitored by the active H/A BPPM cell.

To best use BPPM cell heartbeat events, I recommend to reword the event message because the out-of-box message doesn't contain enough information. When a PATROL agent goes down, you would receive an event with out-of-box slots like this:
MC_CELL_HEARTBEAT_FAILURE;
cell='PatrolAgent@server1@172.118.2.12:3181';
msg='Monitored Cell is no longer responding';
...
END
You may want to reword the msg to 'PatrolAgent@server1@172.118.2.12:3181 is no longer responding'. For its reciprocal MC_CELL_HEARTBEAT_ON event, you may want to reword its message in a similar way.

Monday, September 16, 2013

Parameter (Metrics) Thresholds: Do I still need to set them in PATROL?

On BPPM server, you can view all the data sent from each PATROL agent. You can set parameter (metrics) thresholds there including absolute thresholds such as 95% for file system utilization. Now you may wonder if you still need to set parameter thresholds in each PATROL agent.

In theory, if you send all PATROL data to BPPM server, it seems to be a good idea to have all parameter thresholds set on BPPM server only. Imagine how much time you can save for not having to set parameter thresholds in each PATROL agent and how much network bandwidth you can save for not having to send PATROL events to BPPM cells when those thresholds are violated.

In reality, the answer is yes. You still need to set parameter thresholds in each PATROL agent and let PATROL agent (not BPPM server) generate those events that violate absolute thresholds. You still need to send those PATROL events to BPPM cells. Let BPPM server generate intelligent events only and don't set absolute thresholds in BPPM server. The reason here is that not all PATROL data are in BPPM server.

First of all, PATROL agent does not buffer and resend data if it failed to send data to BPPM server for the first time. This situation can happen when there is a brief network outage such as a router is being rebooted. If it failed on the first try, data are lost forever. On the other hand, you may have already known that the connection between PATROL agent and BPPM cell is more robust as PATROL agent does buffer and resend events to BPPM cell with guaranteed delivery. BPPM server and BPPM cell were separately acquired by BMC Software from two different vendors. They use different communication protocols with different levels of robustness.

Second, PATROL agent only sends numerical data to BPPM server, not text data such as text parameter values and annotated data point values. Often those text data are needed as additional information for the events when numerical parameter thresholds are violated. For example, when using PATROL LOG KM, you may need to include information from a text parameter in the event to show the matched string. The only way to include information from a text parameter in events is to let PATROL agent (not BPPM server) generate the events. In addition, some PATROL KMs (e.g. LOG KM with custom events option, older version of Control-M KM, etc.) call event_trigger() to generate events without using parameter thresholds.

Last but not least, PATROL agent sends data to BPPM server every 5 minutes though PATROL agent may collect data more frequently. For example, CPU utilization is collected by PATROL agent every minute. This means that only every 5th value of CPU utilization is sent to BPPM server. Solely relying on absolute thresholds in BPPM server could result in delaying the alerts for 5 minutes or even missing the alerts altogether.

Monday, September 9, 2013

BMC Impact Integration for PATROL (bii4P): Is it no longer needed?

I have received this question from several people: "I have heard that BMC has eliminated bii4P in BPPM 9.0. Is it true? If it is true, why do I still see bii4P in some BPPM 9.0 architecture diagrams? And what can I use instead to send PATROL events to a BPPM cell configured as high availability?"

The simple answer is yes - it is true that bii4P has been eliminated. But bii4P elimination is related to PATROL agent only regardless of the version of BPPM server, BPPM agent, and BPPM cell. Starting from PATROL agent version 9.0, bii4P is no longer required for a PATROL agent to send its events to a cell. The cell can be a BEM 7.x cell, BPPM 8.x cell, or BPPM 9.x cell. If your PATROL agent version is older than 9.0, you still need bii4P even you are running BPPM cell 9.0. That is why you may still see bii4P in some BPPM 9.0 architecture diagrams.

To send PATROL events from a PATROL agent version 9.x to a BPPM/BEM cell configured as high availability, you need to have the following pconfig variables set: "/EventSetup/Configuration/EventCells", "/EventSetup/Configuration/Format", and "/EventSetup/Configuration/Key". For example:

"/EventSetup/Configuration/EventCells" = { REPLACE = "server1/1828,server2/1828" }, "/EventSetup/Configuration/Format" = { REPLACE = "BiiP3" },
"/EventSetup/Configuration/Key" = { REPLACE = "mc" }

*** where server1 is your primary cell server and server2 is your secondary cell server. If you have a standalone cell, you only need to specify server1/1828. ***

However, bii4P is still supported in PATROL agent 9.x. If you would like to send PATROL events to multiple cells (e.g. a production cell and a testing cell for troubleshooting purpose), bii4P is still the only option. In addition, both bii4P and PATROLAgent-to-cell direct connection can co-exist for the same PATROL agent.

In the new PATROLAgent-to-cell direct connection, PATROL agent initiates the connection with a cell and pushes events to the cell. PATROL agent does not have the capability to push events to two different cells at the same time.

bii4P is a standalone adapter. There are two versions of bii4P: bii4P3 and bii4P7. bii4P3 connects to PATROL agents directly while bii4P7 connects to PATROL agents through PATROL console server. bii4P3 is more commonly used nowadays due to its more stable connection with PATROL agents. bii4P initiates the connection with PATROL agents at one end to receive events and pushes the receives events to a cell at the other end.

To send PATROL events to two cells, you can configure two instances of bii4P, or you can configure PATROLAgent-to-cell direct connection for production cell and configure bii4P for testing cell.

Monday, September 2, 2013

Lessons Learned from Migrating BEM 7.4 to BPPM 9.0 - Part 10: Summary

This is the last post for "Lessons Learned from Migrating BEM 7.4 to BPPM 9.0" series. As a summary, here is the architecture diagram of BPPM 9.0 high availability implementation.

This architecture varies slightly from BMC's standard recommendation as we keep BPPM cells and BPPM Agents totally separated on different servers. In a real enterprise IT environment where data flow is steady but event flow is unpredictable, our architecture offers better resource utilization, more flexibility, and more robust high availability.

<This architecture diagram has been deleted>

Monday, August 26, 2013

Lessons Learned from Migrating BEM 7.4 to BPPM 9.0 - Part 9: Cell extension and architecture

In BEM 7.4, we made extensive customization in MRL rules to standardize event processing regardless of event sources. Because our events come from 7 different event sources (BMC PATROL, BMC Portal, and 5 other vendors' monitoring tools), we didn't want to write 7 different sets of rules to process events. We wanted all events share the same processing rules as much as possible.

In our standardized event processing rules, each event goes through the following stages: mapping, conversion, filtering, host/device look-up, action look-up, blackout look-up, aggregation/correlation, update, email notification, ticketing, action, and forwarding. Only at mapping and conversion stages, events from different event sources have their own processing rules. All events share the same processing rules starting from filtering stage. This has allowed us to quickly integrate events from any event source into BEM/BPPM cells in a matter of days or even hours.

The advantage of using cell policies is that you don't have to know MRL programming. But policies slow down the cell processing speed a lot. And, most importantly, there is no policy equivalent for execute rule. While we have already made extensive customization in MRL rules, there is no advantage for us to use policies. We disabled all out-of-box policies. We also enforced strict name convention to make our rules easy to support and upgrade. We have about 30 custom rule files supporting over 20 advanced features.

Our cells were architected in three levels. The first level is for look-up. Each event source has its own first level cell so that if one event source is having an event storm it won't affect the events from other event sources. The second level is for update and notification. All event sources share the same second level cell so that events can be correlated easily. The third level is for service impact.

During our migration from BEM 7.4 to BPPM 9.0, we migrated our first-level and second-level cells as BPPM 9.0 remote cells located on their own servers as I talked about it in Part 3. All our custom MRL rules were migrated into the new cells with little change since our customization was kept in separate files. The embedded cell on BPPM server will replace our old third-level service impact cell. Since we did little work for service impact in BEM 7.4, we plan to do a new implementation for service impact once our Atrium upgrade is completed.

The only major change we had to make is the custom GUI display. In BEM 7.4, we made several display templates for administrators, developers, and service desk operators in its Java GUI (BMC Impact Explorer). Since there is no direct migration path from Java GUI to web GUI, we had to re-create all templates in BPPM 9.0 web GUI.

Monday, August 19, 2013

Lessons Learned from Migrating BEM 7.4 to BPPM 9.0 - Part 8: Dual GUI

Back in Part 2, I mentioned that one of the major limitations to migrate BEM 7.4 to BPPM 9.0 is the lack of GUI access for up to 10 minutes during BPPM server failover. We are a hospital environment, we have enterprise service desk operators monitoring BEM/BPPM GUI 24x7 to escalate trouble ticket acknowledgement and processing.

In BPPM 9.0, a web GUI is used as operations console. Because the web server is located on BPPM server and it takes up to 10 minutes for the secondary BPPM server to resume operation during BPPM server failover, our service desk would experience a total enterprise blackout for up to 10 minutes. This limitation does not meet our business requirement in a hospital environment. It had been holding us from migrating to BPPM sooner. To overcome this limitation, we had to think out of box again.

In BEM 7.4, a Java GUI (BMC Impact Explorer) is used as operations console. All cells and login servers are set up in their native application-level failover with no downtime. During the failover, our service desk operators would see the yellow highlight for several seconds before all operations are resumed. We decided to see if we can mix BPPM 9.0 cells with BEM 7.4 login servers and BMC Impact Explorer.

We made no change to BPPM 9.0 configuration on BPPM server, BPPM agents, and BPPM cells. We kept a pair of BEM 7.4 login servers (also called admin servers) on two separate Windows servers. We simply registered all BPPM 9.0 cells with these two BEM 7.4 login servers. Now our service desk operators can continue using the Java GUI (BMC Impact Explorer) to access BPPM 9.0 cells.

During BPPM server failover, the only cell that our service desk operators cannot see for up to 10 minutes is BPPM main cell - which displays intelligent events generated by BPPM server and service impact only. All alerts raised by monitoring tools, all email notifications, and all automated Remedy ticket generation are displayed and processed by remote cells with application-level failover. Our service desk operators can continue seeing all of them during BPPM server failover. Absolutely no downtime and no enterprise blackout! We were so thrilled to see how great the hybrid configuration worked.

For ESM administrators and operations support, we can pick and choose between BPPM 9.0 web GUI and BEM 7.4 Java GUI. BPPM 9.0 web GUI allows us to associate data with events while BEM 7.4 Java GUI gives us fast access to events and dynamic tables. By keeping both BPPM 9.0 web GUI and BEM 7.4 Java GUI, not only we avoided total enterprise blackout, we were able to convince everyone to finally migrate BEM 7.4 to BPPM 9.0.

Monday, August 12, 2013

Lessons Learned from Migrating BEM 7.4 to BPPM 9.0 - Part 7: PATROL

We run PATROL on both AIX and Windows. In addition to monitor operating systems, log files, and VMWare, we have developed custom PATROL knowledge modules to monitor CACHE database, media manager, Veritas Cluster Server, and clinic applications. All our PATROL agents were upgraded or deployed in version 9.0 so we could use the automatic workflow to push PATROL data to integration service and PATROL events to BPPM cell.

As I mentioned in my previous posts, all our BPPM components are configured for high availability to meet the highest business requirements in a hospital environment. For PATROL data, the high availability of all integration services and BPPM agents are configured through Microsoft Windows Cluster. We put the clustered server name in pconfig variable "/AgentSetup/integration/integrationServices". For PATROL events, the high availability of all BPPM cells are configured through their native application cluster. We put both primary and secondary server names in pconfig variable "/EventSetup/Configuration/EventCells".

After we replaced bii4p3 (PATROL event adapter) with direct PATROL agent to cell connection using pconfig variables, bii4P3 is no longer needed for PATROL agent 9.0 to send PATROL events to BPPM cell. However we still kept bii4P3 running on all our test systems after migration. This allows us to receive PATROL events on both production BPPM cell and test BPPM cell at the same time for live troubleshooting when needed since pconfig variable "/EventSetup/Configuration/EventCells" can only send PATROL events to one cell.

We had to change MRL rules in BPPM cell to detect PATROL agent down or PATROL agent connection loss events for direct PATROL agent to cell connection. These events are very different from the events using bii4p3. And we also had to develop a few rules to capture PATROL agent up or PATROL agent connection up events and match them to PATROL agent down or PATROL agent connection loss events. In addition, we developed similar rules for PATROL agent connection with integration service. These infrastructure connection events, along with all other events reported by event sources, are fully integrated with email notification and Remedy ticketing system at the back end.

We had an PATROL Central console 7.5 (both Windows edition and web edition) as well as PATROL Classic Console 3.5 running before the migration. They still worked well with PATROL agent 9.0 after the migration. We didn't find any need to upgrade them.

Monday, August 5, 2013

Lessons Learned from Migrating BEM 7.4 to BPPM 9.0 - Part 6: BMC event adapter mcxa

BMC Event Adapter (mcxa) is an adapter BMC provides to integrate SNMP traps into BPPM/BEM cells. It was developed in Perl and can be installed anywhere. Although most vendors nowadays can send SNMP traps when alerts are raised, we prefer to use OS scripts to integrate events from non-BMC monitoring tools into BPPM cells.

While OS scripts can be logged, buffered, and retried with seamless failover, SNMP traps usually cannot - meaning that a slightest network instability could result in trap loss. Because SNMP trap based event integration is less reliable and more difficult to troubleshoot, we only use it when the monitoring tool does not provide a way to execute OS scripts when alerts are raised. In addition, SNMP trap based event integration requires an adapter while OS script based event integration makes direct connection to a BPPM cell.

Out of 5 non-BMC monitoring tools we have, only one is integrated into a BPPM cell using BMC Event Adapter (mcxa) because it cannot execute an OS script when an alert is raised.

Very little has been changed for BMC Event Adapter (mcxa) from BEM 7.4 to BPPM 9.0. We first converted its MIB file to map file. Then we configured BMC Event Adapter (mcxa). We had to change the default parameter settings for PollInterval, ReadsPerEngine, SnmpRcvbuf to maximize the capacity of mcxa in order to accommodate the large volume of the incoming SNMP traps. We also had to double the default value for SnmpTrapLength parameter in order to accommodate the large size of the incoming SNMP traps.

To increase the reliability, we installed two instances of BMC Event Adapter (mcxa) with one instance on each cell server. From the non-BMC monitoring tool, we configured the SNMP traps to be sent to those two mcxa instances simultaneously. This dual-configuration helps to minimize the SNMP trap loss in case of network connection failure. It also helps to address the lack of out-of-box high-availability feature in BMC Event Adapter (mcxa).

For the cell knowledge base, we made a minor change in the auto-generated mcsnmptrapdmib.baroc file so that we could write one rule instead of 50+ rules for all 50+ OIDs. We also added a de-duplication rule to remove the duplicated SNMP traps from the 2nd mcxa instance.

For the rest of cell knowledge base, we followed our standard procedures to map, convert, filter, correlate, update, define actions, execute actions, send email, and create tickets. In a later post, I will go into more details of our standard procedures in our cell knowledge base that universally apply to events from all event sources.

Monday, July 29, 2013

Lessons Learned from Migrating BEM 7.4 to BPPM 9.0 - Part 5: Remedy integration IBRSD

In a previous post (Part 2), I mentioned that all BEM level emails, tickets, and actions need to take place on remote cells in order to meet no-downtime requirement for critical events in a hospital environment. This means that IBRSD for BEM-level events should not be located on BPPM server due to its up to 10 minutes down time during failover. We also wanted to offload as many components as possible from BPPM server to improve the performance. In our BEM 7.4 implementation, we had 2 instances of IBRSD installed on 2 of our cell servers to achieve active/active high availability and load balance. They handled all our ticket creation and update well. We decided to keep the same architecture in BPPM 9.0.

However IBRSD is available only as part of BPPM server installation package in BPPM 9.0 while we needed it as part of BPPM agent installation package. We contacted BMC support but was told that they were not able to help at all. They did enter an enhancement request so hopefully we can see it packaged with BPPM agent in the future releases. Meanwhile we had to come up with a different way to install IBRSD on our cell servers. We copied the entire installation directory of IBRSD from BPPM server, added a few environment variables, and configured a new IBRSD instance in the copied directory. Fortunately the instances on both cell servers worked well.

By now I talked about how we architected BPPM server, BPPM agents, BPPM cells, and IBRSD in our environment for high availability, scalability, and performance. We used Microsoft Windows Clusters for BPPM server and BPPM agents. We used native application clusters for BPPM cells. We installed BPPM agent and integration service on integration service node. We installed BPPM cells, BMC Event Adapter, BMC Event Log Adapter, and IBRSD on cell server. By keeping BPPM cells completely separated from BPPM agents, not only we eliminated down time for BPPM cell failover, we also minimized the down time for BPPM agent failover. In addition, it offers better BPPM cell data protection by having duplicated event repositories. As an added bonus, it cost less since we needed fewer Microsoft Windows Cluster licenses.

Our implementation is somewhat different from what BMC recommends. On various documents and best practice webinars, BMC recommends to co-locate BPPM agents and BPPM cells on the same server and use disk-level OS clusters to achieve high availability. Had we gone with that recommendation, we would experience not only longer down time for failover, but also increased risk that another cell may fail at the secondary node.

Here are the lessons learned so far: To realize the highest ROI on BMC Software investment, business requirements should drive technical design. It is important to evaluate all options through due diligence. Performing due diligence does require the support from the management of the organization and systematic approach to test and verify the proposed model. Sometimes we need to think out of box as shown in the IBRSD example.

Monday, July 22, 2013

Lessons Learned from Migrating BEM 7.4 to BPPM 9.0 - Part 4: BPPM agent and integration service

Based on 250K attributes per BPPM agent sizing guideline from BMC, we installed 2 BPPM agents on 2 Windows 2008 servers - one for Portal data and the other for PATROL data. As we continue adding more PATROL agents, we will add another BPPM agent on the 3rd server once the number of PATROL agents and attributes exceeds the current capacity.

For high availability, we decided on disk-level OS cluster for BPPM Portal agent as we could not find another option. But for BPPM PATROL integration service and its BPPM agent, there is another option with active/active failover. We were excited when we first learned that integration service supports active/active failover because it looked similar to the architecture of BMC Portal with active/active RSM failover. However upon close examination we realized that the major difference is that BPPM integration service and its BPPM agent have no concept of cluster while BMC Portal App server treats the active/active RSM pair as one cluster. Portal data from the same metrics and same instance will be stored in the same database table regardless which RSM was used as the "middleman". However, when PATROL data are sent to the 2nd BPPM integration service when the 1st integration service is unreachable, the data will not be stored in the same table in BPPM database as the data sent through the 1st integration service - thus not displayed in the same graph.

We used the same scoring system as described in the last post (Part 3) to compare disk-level OS cluster and active/active failover for BPPM Integration Service. Our comparison result showed that disk-level OS cluster scored 27 points (partial yes to #1, and yes to #2 and #3) while active/active failover scored 22 points (yes to #1, #3 and #6). Therefore, we decided to use Microsoft Windows Cluster for all integration service nodes where BPPM agents are installed.

Because all BPPM cells are installed on separate servers as I mentioned in my last post (Part 3), only BPPM agent and integration service are running on integration service nodes. We disabled all event related components such as cells, event adapter, and event log adapter on integration service nodes because they are already running on separate servers (we refer them as cell servers). In Microsoft Windows Cluster, all services within the cluster must be included in the failover group. When one service failed, the entire group must be moved to the secondary node. The more services are included in the failover group, the longer it takes to move the entire group and the higher the risk that another service may fail on the secondary node. Disk-level OS clusters such as Microsoft Windows Cluster always involve some down time during failover. However we managed to limit the down time to under 5 minutes by minimizing the services in the failover group to include only BPPM agent and Integration Service.

In the next couple of posts, I will go through the configuration details on those event-related components.

Monday, July 15, 2013

Lessons Learned from Migrating BEM 7.4 to BPPM 9.0 - Part 3: BPPM cell

It is a common misconception that Integration Service processes both data and events. Actually Integration Service processes PATROL data only and BPPM cell processes events only. In most of BMC's architecture diagrams, Integration Service and BPPM cell are co-located on the same server (called Integration Service node). In reality, Integration Service and BPPM cell are not directly related and are not required to be installed on the same server. In this post, I will focus on BPPM Cell. BPPM Agent and Integration Service will be discussed in the next post.

In Our BEM 7.4 environment, we have 9 pairs of cells running in high availability as application clusters with 7 cells on Windows 2003 servers and 2 cells on Linux RHEL 5.5 64-Bit servers. We were happy with the configuration as we never experienced down time even during BEM upgrade from 7.2 to 7.4. In BPPM 9.0, since BMC suggested a disk-level OS cluster for BPPM cells, we decided to do a side-by-side comparison between application cluster and OS cluster. We used a 10-point scoring system for the following 4 criteria: 1) Can the (cell) pair failover with no down time; 2) Is the (cell) pair a cluster (treated as one by their consumers); 3) Can the sender automatically switch to the 2nd destination when the 1st destination is unreachable; 4) Can the sender buffer the content and resend if the destination is unreachable. Each yes is 10 points and each no is 0 point. Partial yes would get a score between 1 and 9. We also added 3 bonus points for 5) automatically backing up data storage; and 2 bonus points for 6) lower hardware and OS cost.

Our comparison result showed that cell application cluster option scored 45 points (yes to all 6) while OS cluster option scored 37 points (partial yes to #1 and yes to #2, #3, and #4). So we kept the same high availability configuration as in BEM 7.4. We installed our Windows cells on Windows 2008 servers with 'cell only' option, not sharing the servers with BPPM Agent and Integration Service. We decided to delay Linux cell migration to the next phase to minimize the involvement of another organization. Our test has shown that BEM 7.4 cells can integrate well with BPPM 9.0 cells since very little has been changed in BEM cell features and architecture.

To take advantage of BPPM Server's analytic features, we added another pair of external BPPM cells for BPPM internal events since we wanted all events to be processed in external cells first. By default, all BPPM internal events are sent to the embedded cell on BPPM server. After BPPM 9.0.20, we were able to make a configuration change in pronet.conf on BPPM server so that all internal events are sent to an external cell.

Monday, July 8, 2013

Lessons Learned from Migrating BEM 7.4 to BPPM 9.0 - Part 2: BPPM server

We installed BPPM Server 9.0 on a physical Windows 2008 server with 32GB memory. From what we have learned in BPPM 8.1 and 8.5, we knew that BPPM server is resource intensive. So we doubled the amount of memory from what BMC recommended, and we tried to offload as many components as possible to other servers. We configured BPPM database on an Oracle RAC instance running on AIX servers.

For high availability, we decided to use Microsoft Windows Cluster as recommended by BMC. We were fully aware that it could take up to 10 minutes for the secondary server to resume operation after the primary server failed - meaning no data collection, no service impact updating, and no GUI access for up to 10 minutes. But we could not find a better option for BPPM server high availability as BMC still does not support application-level failover for BPPM Server.

Data in BPPM server are sampled every 5 minutes even some data could be collected more frequently at the data source (e.g. PATROL agent). Unlike events, when the destination is unreachable, data are simply thrown away, not buffered. We would miss 2-3 data points during BPPM server failover. We realized that this is something we can compromise because trended data are not as critical as availability events in a hospital environment.

Similarly we can compromise no service impact updating for 10 minutes as long as the lower-level critical events can still generate emails and incident tickets. This means that we will only use the embedded SIM cell for emails and incident tickets related to service models. All raw events will be sent to remote cells first for BEM level emails, tickets, and actions. They will be forwarded to this embedded SIM cell for service impact analysis and probable cause analysis. In a later post, I will go through the details on how we configured high availability for remote cells.

Unfortunately the lack of GUI access for 10 minutes is not something we can compromise. 10 minutes could mean the difference between life and death in a hospital environment. A 10-minute enterprise visibility blackout is simply not an option. In a later post, I will talk about how we have overcome this limitation.

Since our Atrium version was older than 7.6.04, we could not install Atrium CMDB Extension before installing BPPM Server 9.0. But we have learned that we can enable this option in BPPM Server later after Atrium upgrade without re-installing BPPM Server.

Monday, July 1, 2013

Lessons Learned from Migrating BEM 7.4 to BPPM 9.0 - Part 1: Background

One of my recent clients is the largest municipal healthcare organization in the country consisting of hospitals, nursing facilities, treatment centers, and community clinics. I have helped them implement BMC Event Manager (BEM), BMC PATROL, BMC Portal, and integrate monitoring tools from other vendors with BEM.

We run 9 pairs of BEM cells, 2 instances of IBRSD, 2 instances of bii4P3, 2 instances of IIWS, and 2 instances of BMC Event Adapters to process 12,000 events from 7 different monitoring tools and generate 800 automated Remedy tickets per day. I refer our architecture as 'cell cloud' because this robust and flexible event processing service is hosted by servers located in different data centers, on different operating systems, and even based on different versions of BEM releases for a while. Every component in the cloud is configured as seamless high availability at application level and all events to the cloud are buffered with no down time and no transaction loss to meet the highest business requirements by hospitals. Our event processing is based on 'cell extension' technology that I made extensive customization to the out-of-box cell knowledge base. By eliminating policies and standardizing event processing with dynamic data tables, our BEM implementation is powerful, flexible, and easy to maintain.

Being a large BMC customer in healthcare industry, we have been encouraged by BMC to migrate to BPPM. And we were constantly invited by BMC to attend BPPM briefings, roadmaps, demos, webinars, and Q&A sessions. Prior to BPPM 9.0, we participated in extensive evaluations on both BPPM 8.1 and BPPM 8.5. We have given BMC extensive feedback on the limitations in BPPM that had been holding us back from migrating to BPPM.

When we finally made a decision to migrate our BEM 7.4 to BPPM 9.0, our primary objective is to preserve all the scalability, performance, flexibility, and high availability in BEM 7.4. We are so proud that our 'cell cloud' technology survived emergency data center failover during Hurricane Sandy with no down time. We don't want to compromise any of these capabilities when upgrading to BPPM 9.0.

In the next few posts, I will share my experience and the lessons learned from migrating BEM 7.4 to BPPM 9.0. Your comments are greatly appreciated.

BMC BPPM Consulting