I recently got an email from my home server, warning me that it had detected an error on one of its hard drives. This was automatically generated by smartd, part of SmartMonTools, that monitors the health of the disk storage attached to my server by running a series of regular tests without my intervention.
To find out what had exactly been found, I used the smartctl command to see the logged results of the last few self-tests. As you can see, the daily Short Offline tests were all passing successfully, but the long-running weekly Extended Offline tests were showing a problem with the same LBA on each run, namely LBA 1318075984:
# smartctl -l xselftest /dev/sda
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.13.0-24-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Extended Self-test Log Version: 1 (1 sectors)
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 9559 -
# 2 Short offline Completed without error 00% 9535 -
# 3 Short offline Completed without error 00% 9511 -
# 4 Extended offline Completed: read failure 70% 9490 1318075984
# 5 Short offline Completed without error 00% 9487 -
# 6 Short offline Completed without error 00% 9463 -
# 7 Short offline Completed without error 00% 9440 -
# 8 Short offline Completed without error 00% 9416 -
# 9 Short offline Completed without error 00% 9392 -
#10 Short offline Completed without error 00% 9368 -
#11 Short offline Completed without error 00% 9344 -
#12 Extended offline Completed: read failure 70% 9322 1318075984
#13 Short offline Completed without error 00% 9320 -
#14 Short offline Completed without error 00% 9296 -
#15 Extended offline Completed without error 00% 9204 -
#16 Short offline Completed without error 00% 9198 -
#17 Short offline Completed without error 00% 9176 -
#18 Short offline Completed without error 00% 9152 -
The fact that this is a “read failure” probably means that this is a medium error. That can usually be resolved by writing fresh data to the block. This will either succeed (in the case of a transient problem), or cause the drive to reallocate a spare block to replace the now-failed block. The problem, of course, is that that block might be part of some important piece of data. Fortunately I have backups. But I’d prefer to restore only the damaged file, rather than the whole disk. The rest of this post discusses how to achieve that.
Firstly we need to look at the disk layout to determine what partition the affected block falls within:
gdisk -l /dev/sda
GPT fdisk (gdisk) version 0.8.8
Partition table scan:
MBR: protective
BSD: not present
APM: not present
GPT: present
Found valid GPT with protective MBR; using GPT.
Disk /dev/sda: 3907029168 sectors, 1.8 TiB
Logical sector size: 512 bytes
Disk identifier (GUID): 453C41A1-848D-45CA-AC5C-FC3FE68E8280
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 3907029134
Partitions will be aligned on 2048-sector boundaries
Total free space is 2157 sectors (1.1 MiB)
Number Start (sector) End (sector) Size Code Name
1 2048 1050623 512.0 MiB EF00
2 1050624 4956159 1.9 GiB 8300
3 4956160 44017663 18.6 GiB 8300
4 44017664 52219903 3.9 GiB 8200
5 52219904 61984767 4.7 GiB 8300
6 61984768 940890111 419.1 GiB 8300
7 940890112 3907028991 1.4 TiB 8300
We can see that the disk uses logical 512 byte sectors, and that the failing sector is in partion /dev/sda7. We also want to know (for later) what is the physical block size for this disk, which can be found by:
# cat /sys/block/sda/queue/physical_block_size
4096
Since this is larger than the LBA size (of 512 bytes) it means that it’s actually the physical block that contains LBA 1318075984 that is failing, and therefore so will be all the other LBAs in that physical block. In this case, that means 8 LBAs. Because of the way the SMART selftests work, it’s likely that 1318075984 and the following 7 will be failing, but we can test that later.
Next we need to understand what filesystem that partition has been formatted as. I happen to know that all my partitions are formatted as ext4 on this system, but you could find this out this information from the /etc/fstab configuration file.
The rest of this post is only directly relevant to ext4/3/2 filesystems. Feel free to use the general process, but please look elsewhere for detailed instructions for BTRFS, XFS, etc etc.
Next thing to do is to determine the offset of the failing LBA into the sda7 partition. So, 1318075984 – 940890112, which is 377185872 blocks of 512 bytes. We now need to know how many filesystem blocks that is, so lets find out what blocksize that partition is using:
# tune2fs -l /dev/sda7 | grep Block
Block count: 370767360
Block size: 4096
Blocks per group: 32768
So, each filesystem block is 4096 bytes. To determine the offset of the failing LBA in the filesystem, we divide the LBA offset into the filesystem by 8 (4096/512), giving us a filesystem offset of 47148234. Since this is an exact result, we know it happens to be the first logical LBA in that filesystem block that is causing the error (as we expected).
Next we want to know if that LBA is in use, or part of the filesystems free space:
# debugfs /dev/sda7
debugfs 1.42.9 (4-Feb-2014)
debugfs: testb 47148234
Block 47148234 marked in use
So we know that filesystem block is part of a file – unfortunately. The question is which one?
debugfs: icheck 47148234
Block Inode number
47148234 123993
debugfs: ncheck 123993
Inode Pathname
123993 /media/homevideo/AA-20140806.mp4
Since the filesystem block size and the physical disk block size are the same, I could just assume that thats the only block affected. But that’s probably not very wise. So lets check the physical blocks (on the disk) before and after the one we know is failing by asking for the failing LBA + and – 8 LBA’s:
# # The reported failing LBA:
# dd if=/dev/sda of=sector.bytes skip=1318075984 bs=512 count=1
dd: error reading ‘/dev/sda’: Input/output error
0+0 records in
0+0 records out
0 bytes (0 B) copied, 2.85748 s, 0.0 kB/s
# # The reported failing LBA - 8:
# dd if=/dev/sda of=sector.bytes skip=1318075976 bs=512 count=1
1+0 records in
1+0 records out
512 bytes (512 B) copied, 0.0246482 s, 20.8 kB/s
# # The reported failing LBA + 8:
# dd if=/dev/sda of=sector.bytes skip=1318075992 bs=512 count=1
1+0 records in
1+0 records out
512 bytes (512 B) copied, 0.0246482 s, 20.8 kB/s
So, in this case we can see that the physical blocks before and after the failing disk block are both currently readable, meaning that we only need to deal with the single failing block.
Since I have a backup of the file that this failing block occurs in, we’ll complete the resolution of the problem by overwriting the failing physical sector with zeros (definitely corrupting the containing file) and triggering the drives block-level reallocation routines, and then delete the file, prior to recovering it from backup:
# dd if=/dev/zero of=/dev/sda skip=1318075984 bs=512 count=8
8+0 records in
8+0 records out
4096 bytes (4.1 kB) copied, 0.000756974 s, 5.4 MB/s
# rm /media/homevideo/AA-20140806.mp4
# dd if=/dev/sda of=sector.bytes skip=1318075984 bs=512 count=8
8+0 records in
8+0 records out
4096 bytes (4.1 kB) copied, 0.000857198 s, 4.8 MB/s
At this point I could run an immediate Extended Offline self-test, but I’m confident that as I can now successfully read the originally-failing block, the problem is solved, and I’ll just wait for the normally scheduled self-tests to be run by smartd again.
Update: I’ve experienced a situation where overwriting the failing physical sector with zeros using dd failed to trigger the drives automatic block reallocation routines. However, in that case I was able to resolve the situation by using hdparm instead. Use:
hdparm --read-sector 1318075984 /dev/sda
to (try to) read and display a block of data, and
hdparm --write-sector 1318075984 --yes-i-know-what-i-am-doing /dev/sda
to overwrite the block with zeros. Both these commands use the drives logical block size (normally 512 bytes) not the 4K physical sector size.