Suppressing duplicate email reports from smartd

One of the precautions I take to ensure that my home server keeps steadily ticking along is to monitor the health of the hard drives with smartmontools. This uses the SMART health monitoring interfaces built into almost every modern hard drive to predict if the drive is starting to exhibit problems that might lead to data loss, or even complete drive failure. To further improve on this, I run the monitoring system as a daemon, and have it run some simple tests each night, and an extensive test (lasting several hours) each week.

And this is great. The system will email me if it spots any problems, giving me the chance to either fix them, or (worst case) order a new hard drive before the old one finally dies. Because generally, when smartd spots a problem, its a sign of the beginning of the end for that drive.

But not always. My current hard drive has been reporting the same error to me for over 9 months now, patiently emailing me the same email every night:

This message was generated by the smartd daemon running on:
host name: house
DNS domain: xxxxxxx.com
The following warning/error was logged by the smartd daemon:
Device: /dev/sda [SAT], 3 Offline uncorrectable sectors
Device info:
WDC WD20EFRX-68AX9N0, S/N:WD-WMC30043xxxx, WWN:5-0014ee-0ae19da81, FW:80.00A80, 2.00 TB
For details see host’s SYSLOG.
You can also use the smartctl utility for further investigation.
The original message about this issue was sent at Sun Jun 29 08:07:30 2014 BST
Another message will be sent in 24 hours if the problem persists.

No matter what I try, I cannot get the drive to resolve the problem, but it’s not getting any worse, and the overall health of the drive is reported as “OK”. So actually, unless the system spots a new error, I just want it to stop emailing me, because otherwise I run the risk of ignoring the server that cried wolf …

So here is the way to get the smartd daemon as installed under Ubuntu Server 14.04 LTS, to not report the same SMART error over and over again:

  1. cd /usr/share/smartmontools
  2. sudo cp smartd-runner smartd-runner.backup

Now, open up smartd-runner in a text editor like vi or gedit, (sudo vi smartd-runner) and make it look like this:


#!/bin/bash -e

laststate="/var/run/smartd.saved.error.state"
# Generate a temporary filename for new error information
tmp=$(tempfile)
# Copy the new error information into the file
cat >$tmp

# Test if the new error information is different to the saved
# error information from our last run.
if ! cmp -s "$tmp" "$laststate"
then
# Save the "new" latest error information for next time
cp $tmp $laststate
# Call the email routine
run-parts --report --lsbsysinit --arg=$tmp --arg="$1" \
--arg="$2" --arg="$3" -- /etc/smartmontools/run.d
fi
# Delete the temporary copy of the error information
rm -f $tmp

Save the file. The system will take one more run of the smartd daemon to “prime” the state into the system, but thereafter the system will not send you the same error twice in a row. Of course, this does mean that you now need to pay attention when the system does email you … or you could modify my code here, so it will send a duplicate “reminder” email again (say) every week, or month, or whatever works for you.

Advertisements

3 thoughts on “Suppressing duplicate email reports from smartd

  1. Thanks for the idea. Fooling around with it, I’ve settled on this:

    In ‘smartd-runner’:

    #!/bin/zsh

    tmp=$(tempfile)
    cat > $tmp

    # Show the time and date of the test, no newline:
    echo -n “$( date +%a_%F_%T ):” >>! /var/lib/smartmontools/smartd-log

    # Retrieve prevous error message:
    SMARTD_PREVIOUS=$( cat /var/lib/smartmontools/smartd-previous )
    # SMARTD_ERROR set in ‘smartd_warning.sh’:
    if [ “$SMARTD_PREVIOUS” = “$SMARTD_ERROR” ]; then
    echo “IDENTICAL” >>! /var/lib/smartmontools/smartd-log
    return
    fi
    # Message is not indentical so echo it to the log:
    echo “$SMARTD_ERROR” >>! /var/lib/smartmontools/smartd-log
    # And save it for the next comparison:
    echo “$SMARTD_ERROR” >! /var/lib/smartmontools/smartd-previous

    # runs ‘/usr/bin/smart-notifier -> /usr/share/smart-notifier/smart-notifier’ via ‘/etc/smartmontools/run.d/60smart-notifier’
    run-parts –report –lsbsysinit –arg=$tmp –arg=”$1″ \
    –arg=”$2″ –arg=”$3″ — /etc/smartmontools/run.d

    rm -f $tmp
    ————————

    In ‘smartd_warning.sh’:

    # Export message with trailing newline
    export SMARTD_FULLMESSAGE=”$fullmessage

    (add one line):
    export SMARTD_ERROR=”${SMARTD_MESSAGE-[SMARTD_MESSAGE]}”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s