TechOpsGuys.com Diggin' technology every day

5Jan/10Off

Acknowledge Nagios Alerts Via Email Replies

Monitoring should be annoying by design - when something is broken, we need to fix it and we need to be reminded it needs fixing until it gets fixed. That's why we monitor in the first place. In that vein, I've configured our Nagios server to notify every hour for most alerts. However, there are times when a certain alert can be ignored for a while and I might not have a computer nearby to acknowledge it.

The solution: acknowledge Nagios alerts via email. A quick reply on my smartphone and I'm done.

Setting it up is fairly simple and involves a few components: an MTA (Postfix in my case), procmail (might need to install it), a perl script and the nagios.cmd file. I used the info in this post to get me started. My instructions below were done on two different CentOS 5.4 installs running Nagios 3.0.6 and Nagios 3.2.0.

Procmail
Make a /home/nagios/.procmailrc file (either su to the nagios user or chown to nagios:nagios afterwards) and paste in the following:

LOGFILE=$HOME/.procmailrc.log
MAILDIR=$HOME/Mail
VERBOSE=yes
PATH=/usr/bin
:0
* ^Subject:[    ]*\/[^  ].*
| /usr/lib/nagios/eventhandlers/processmail "${MATCH}"

Postfix
Tell Postfix to use procmail by adding the the following line to /etc/postfix/main.cf (restart Postfix when finished):
mailbox_command = /usr/bin/procmail
You might want to search your main.cf file for mailbox_command to make sure procmail isn't already configured/turned on. You also might want to do whereis procmail to make sure it's in the /usr/bin folder. If your Nagios server hasn't previously been configured to receive email, you've got some configuration to do - that's outside of the scope of this article, but I would suggest getting that up and running first.

Perl Script
Next up is the perl script that procmail references. Create a /usr/lib/nagios/eventhandlers/processmail file and chmod 755 it - paste in the code below:

#!/usr/bin/perl

$correctpassword = 'whatever';   # more of a sanity check than a password and can be anything
$subject = "$ARGV[0]";
$now = `/bin/date +%s`;
chomp $now;
$commandfile = '/usr/local/nagios/var/rw/nagios.cmd';

if ($subject =~ /Host/ ){      # this parses the subject of your email
        ($password, $what, $junk, $junk, $junk, $junk, $junk, $host) = split(/ /, $subject);
        ($host) = ($host) =~ /(.*)\!/;
} else {
        ($foo, $bar) = split(/\//, $subject);
        ($password, $what, $junk, $junk, $junk, $junk, $junk, $host) = split(/\ /, $foo);
        ($service) = $bar =~ /^(.*) is.*$/;
}

$password =~ s/^\s+//;
$password =~ s/\s+$//;

print "$password\t$what\t$host\t$service\n";

unless ($password =~ /$correctpassword/i) {
        print "exiting...wrong password\n";
        exit 1;
}

# ack - this is where the acknowledgement happens
# you could get creative with this and pass all kinds of things via email
# a list of external commands here: http://www.nagios.org/development/apis/externalcommands/
if ($subject =~ /Host/ ) {
        $ack =
"ACKNOWLEDGE_HOST_PROBLEM;$host;1;1;1;email;email;acknowledged through email";
} else {
        $ack = "ACKNOWLEDGE_SVC_PROBLEM;$host;$service;1;1;1;email;acknowledged through email";
}

if ($what =~ /ack/i) {
        sub_print("$ack");
} else {
        print "no valid commands...exiting\n";
        exit 1;
}


sub sub_print {
        $narf=shift;
        open(F, ">$commandfile") or die "cant";
        print F "[$now] $narf\n";
        close F;
}

The script above assumes certain things about how your email subject line is formatted and you might have to tweak it if you've done much/any customization to the Notification commands in the default commands.cfg file. One thing you will need to change is the "Host" variable. The default is to put Host: $HOSTALIAS$ in the subject - you'll need to replace that with $HOSTNAME$ as that is what the nagios.cmd file expects. If you don't change that, the perl script above will pass the $HOSTALIAS$ to the nagios.cmd file and it won't know what to do with it. Below is a sample of my notify-service-by-email command:

define command{
        command_name    notify-service-by-email
        command_line    /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nComment: $SERVICEACKCOMMENT$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$\n\nMonitoring Page: http://nagios1/nagios\n" | /bin/mail -s "** $NOTIFICATIONTYPE$ Service Alert: $HOSTNAME$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$

Example
So, when i get an alert that has a subject something like this:
** PROBLEM alert - server1/CPU Load is WARNING **
I can just reply and add "whatever ack" to the beginning of the subject line:
whatever ack RE: ** PROBLEM alert - server1/CPU Load is WARNING **
and the alert will be acknowledged.

Troubleshooting
As I said earlier, you will want to make sure Postfix is configured correctly for receiving email for the Nagios user - this might be an area where you'll have issues if it's not set up correctly. The other thing that fouled me up a few times was the Notification command section I mentioned above. By passing commands directly to the nagios.cmd file and by watching the log files, you should be able to spot any misconfigs.

TechOps Guy:

Filed under: Monitoring Comments Off
Comments (7) Trackbacks (0)
  1. I get:

    procmail: No match on “^Subject:[ ]*\/[^ ].*”

    I’m running Sendmail, but mail is hitting procmail. Do you have any what I could be missing?

  2. I could not get this to work for Host alerts, but it was working fine for Service alerts. I commented out “($host) = ($host) =~ /(.*)\!/;” and everything works fine now. I just thought this might help someone in the future. Besides that, this has been great. Thanks.

  3. This didn’t work at all for me when used with CentOS 6.2, postfix, procmail, and NagiosXI. The regex didn’t match (it was some issue with the binding operator =~), and once I fixed that, the acknowledge commands submitted to nagios.cmd were being silently ignored.

    I have rewritten the script to parse things correctly. Once I get nagios to stop silently ignoring the command, I can share the work I have done.

    One question I have, doesn’t nagios require the email address of the person who is acknowledging the alarm?

  4. @Nick
    Were you successful? I run in the same problems with CentOS 6.3 and sendmail / procmail and nagios 3.4.1.
    I can’t find the solution.
    We are discussing it here in the german forum:
    http://www.monitoring-portal.org/wbb/index.php?page=Thread&postID=177912&s=895782bc01c19a77fa4c13dbc15822729da797ac#post177912

  5. Awesome blog you have here but I was curious about if you knew of any user discussion forums
    that cover the same topics discussed here?
    I’d really love to be a part of community where I can get suggestions from other experienced individuals that share the same interest. If you have any suggestions, please let me know. Many thanks!

  6. Annoyingly the regex fails because the code is wrong.

    .procmailrc should read

    * ^Subject: [ ]*\/[^ ].*

    There should be a space between the colon and first parenthesis.

  7. hi,

    I am actually looking into my .procmailrc file, where after I copy pasted the same script you provided I don’t see any log file created in the place after ls -la in my /home/nagios or /home directories can you please let me know the remedy of it. Also, chowned to nagios:nagios for .procmailrc file.


Trackbacks are disabled.