Monday, September 1, 2014

PatrolCli - Part 2: Monitor PATROL agent health outside of PATROL/BPPM framework

Sometimes, you have a need to monitor PATROL agent health outside of PATROL/BPPM framework either manually or automatically using a different tool.  For example, during the upgrade of BPPM server or PATROL console, you may not be able to access PATROL agents using BPPM console or PATROL console.

Using PatrolCli, you can check PATROL agent health running on a remote server without BPPM console or PATROL console.  Based on the output you have received, you will be able to tell if the PATROL agent running on a remote server is experiencing a problem.

If PATROL agent is down, you will receive an error message similar to the following:

Myserver> PatrolCli
PCli% open RemoteServer 3181
Username: patrol
Password:
Can't connect to RemoteServer (TCP/3181) as patrol : connecting to agent RemoteServer ...

If PATROL agent is running but stopped collecting data, you can use PatrolCli to check the latest timestamp of a common parameter, e.g., CPUprcrProcessorTimePercent on Windows or CPUCpuUtil on UNIX.

Here is an example to retrieve the latest timestamp of CPUprcrProcessorTimePercent. The timestamp is displayed as epoch time.

PCli% execpsl get("/NT_CPU/CPU__Total/CPUprcrProcessorTimePercent/time");
1409639472

Comparing the latest timestamp for CPU data collection with current time:

PCli% execpsl time();
1409639562

If the difference between the current time and the latest data collection timestamp is too long (> 10 minutes), you can reasonably conclude that PATROL agent has stopped collecting data.  In our example, 1409639562 - 1409639472 = 90 seconds. The data collection looks good.

In the next post, we will discuss how to run PatrolCli from a script so you can use another scheduling tool such as UNIX cron or Windows admin program to periodically check PATROL agent health automatically.

No comments:

Post a Comment