Suppressing duplicate email reports from smartd

One of the precautions I take to ensure that my home server keeps steadily ticking along is to monitor the health of the hard drives with smartmontools. This uses the SMART health monitoring interfaces built into almost every modern hard drive to predict if the drive is starting to exhibit problems that might lead to data loss, or even complete drive failure. To further improve on this, I run the monitoring system as a daemon, and have it run some simple tests each night, and an extensive test (lasting several hours) each week.

And this is great. The system will email me if it spots any problems, giving me the chance to either fix them, or (worst case) order a new hard drive before the old one finally dies. Because generally, when smartd spots a problem, its a sign of the beginning of the end for that drive.

But not always. My current hard drive has been reporting the same error to me for over 9 months now, patiently emailing me the same email every night:

This message was generated by the smartd daemon running on:
host name: house
DNS domain: xxxxxxx.com
The following warning/error was logged by the smartd daemon:
Device: /dev/sda [SAT], 3 Offline uncorrectable sectors
Device info:
WDC WD20EFRX-68AX9N0, S/N:WD-WMC30043xxxx, WWN:5-0014ee-0ae19da81, FW:80.00A80, 2.00 TB
For details see host’s SYSLOG.
You can also use the smartctl utility for further investigation.
The original message about this issue was sent at Sun Jun 29 08:07:30 2014 BST
Another message will be sent in 24 hours if the problem persists.

No matter what I try, I cannot get the drive to resolve the problem, but it’s not getting any worse, and the overall health of the drive is reported as “OK”. So actually, unless the system spots a new error, I just want it to stop emailing me, because otherwise I run the risk of ignoring the server that cried wolf …

So here is the way to get the smartd daemon as installed under Ubuntu Server 14.04 LTS, to not report the same SMART error over and over again:

cd /usr/share/smartmontools
sudo cp smartd-runner smartd-runner.backup

Now, open up smartd-runner in a text editor like vi or gedit, (sudo vi smartd-runner) and make it look like this:

#!/bin/bash -e


laststate="/var/run/smartd.saved.error.state"

# Generate a temporary filename for new error information

tmp=$(tempfile)

# Copy the new error information into the file

cat >$tmp

# Test if the new error information is different to the saved # error information from our last run. if ! cmp -s "$tmp" "$laststate" then # Save the "new" latest error information for next time cp $tmp $laststate # Call the email routine run-parts --report --lsbsysinit --arg=$tmp --arg="$1" \ --arg="$2" --arg="$3" -- /etc/smartmontools/run.d fi # Delete the temporary copy of the error information rm -f $tmp

Save the file. The system will take one more run of the smartd daemon to “prime” the state into the system, but thereafter the system will not send you the same error twice in a row. Of course, this does mean that you now need to pay attention when the system does email you … or you could modify my code here, so it will send a duplicate “reminder” email again (say) every week, or month, or whatever works for you.

4 thoughts on “Suppressing duplicate email reports from smartd”

Gem Newman says:

November 24, 2015 at 7:29 pm

I have been looking for a solution to this exact issue for two months. Thanks!

Emil says:

June 7, 2016 at 8:59 am

Great job
10x 🙂

ray says:

January 25, 2017 at 1:43 am

Thanks for the idea. Fooling around with it, I’ve settled on this:

In ‘smartd-runner’:

#!/bin/zsh

tmp=$(tempfile)
cat > $tmp

# Show the time and date of the test, no newline:
echo -n “$( date +%a_%F_%T ):” >>! /var/lib/smartmontools/smartd-log

# Retrieve prevous error message:
SMARTD_PREVIOUS=$( cat /var/lib/smartmontools/smartd-previous )
# SMARTD_ERROR set in ‘smartd_warning.sh’:
if [ “$SMARTD_PREVIOUS” = “$SMARTD_ERROR” ]; then
echo “IDENTICAL” >>! /var/lib/smartmontools/smartd-log
return
fi
# Message is not indentical so echo it to the log:
echo “$SMARTD_ERROR” >>! /var/lib/smartmontools/smartd-log
# And save it for the next comparison:
echo “$SMARTD_ERROR” >! /var/lib/smartmontools/smartd-previous

# runs ‘/usr/bin/smart-notifier -> /usr/share/smart-notifier/smart-notifier’ via ‘/etc/smartmontools/run.d/60smart-notifier’
run-parts –report –lsbsysinit –arg=$tmp –arg=”$1″ \
–arg=”$2″ –arg=”$3″ — /etc/smartmontools/run.d

rm -f $tmp
————————

In ‘smartd_warning.sh’:

# Export message with trailing newline
export SMARTD_FULLMESSAGE=”$fullmessage
”
(add one line):
export SMARTD_ERROR=”${SMARTD_MESSAGE-[SMARTD_MESSAGE]}”

A non repeater says:

May 4, 2021 at 7:46 pm

Thanks!!

Richard's Blog

Random musings on life, death and technology

Suppressing duplicate email reports from smartd

4 thoughts on “Suppressing duplicate email reports from smartd”

Leave a comment Cancel reply

Richard's Blog

Random musings on life, death and technology

Share this:

Related

4 thoughts on “Suppressing duplicate email reports from smartd”

Leave a comment Cancel reply