Monday, September 29, 2014

PatrolCli - Part 6: Re-sync parameter status with PATROL agents upon exiting blackout in BPPM cell

Event blackout rule or event blackout policy in BPPM cell is something we all rely on to suppress alerts during regularly scheduled maintenance window.  Upon exiting blackout period, if a PATROL parameter alert (e.g. process down alert) is still present, what should you do?

If you choose to ignore it and the process is still down, no one will be notified.  PATROL agent only generates an alarm event once when a process goes down.  If the process went down during the blackout period with no notification sent from BPPM cell, PATROL agent will never generate another alarm event again if the process remains down after blackout period ended.

If you choose to send a notification for every suppressed alert in BPPM cell upon exiting blackout, you may send out lots of false alarms.  During the maintenance window, many PATROL agents may be restarted as the result of server reboot or PATROL configuration change.  The process that was previously down may be brought up as the result of PATROL agent or server restart.  However a newly started PATROL agent will not generate an OK event since there is no state change on PATROL parameter.

Either way, we have a problem.  The best solution is for BPPM cell to re-check PATROL parameter status for each outstanding alert upon exiting blackout. From all the PATROL users I have talked to, this is one of the most-wanted features for event blackout.  Although this feature doesn't come out of box, you can write your own code using PatrolCli.

For example, you can use the following PatrolCli command to check 'mcell' process status:

PCli% execpsl get("/NT_PROCESS/mcell/PROCStatus/status");
OK

It does require some advanced MRL programming skill to tie everything together.  If you need more help, please feel free to contact us for consulting services.  We have developed a proprietary extension for BPPM cell that have addressed many out-of-box limitations including event blackout.

6 comments:

  1. Hi Willa,

    If the above command returns ALARM , it means the parameter still remains in ALARM state even after BLACKOUT ? If so , we will have to write PSL scripts to trigger an alert right?

    Thanks,
    Jeevan Anne

    ReplyDelete
  2. Jeevan,

    Thank you for your message. What specific action to take if the parameter is still in ALARM state after blackout depends on how your BPPM cell KB was programed. If your cell KB requires a new event to trigger notification and ticketing actions, you need to generate a new event using MRL or PSL (haven't tried PSL myself). If your cell KB can trigger notification and ticketing actions based on custom action-flag slot value without receiving a new event, you can simply reset that action-flag slot. Since every cell KB is custom developed, you have several options based on your customization.

    Thanks!
    Willa

    ReplyDelete
  3. Hi Willa ,

    Thanks for the prompt response . I totally understand the customization part .
    For the PCli% execpsl get("/NT_PROCESS/mcell/PROCStatus/status"); command , if it returns ALARM , does it mean the parameter is still remains same (alerting) even after BLACKOUT windows?

    Thanks,
    Jeevan Anne

    ReplyDelete
  4. Hi Jeevan,

    Yes, you are right. It would mean that the parameter is still in ALARM state after blackout ended.

    Thanks!
    Willa

    ReplyDelete
  5. Hi Willa ,

    Thank you for confirming and also for sharing some insights related to Patrol , BPPM , BEM . Great source of information .

    Thanks,
    Jeevan Anne

    ReplyDelete
  6. Jeevan,

    Happy to know that my blog is helpful to you. Thank you so much for your encouragement and support!

    Willa

    ReplyDelete