this post was submitted on 10 Jul 2023
21 points (95.7% liked)

Linux

48920 readers
1683 users here now

From Wikipedia, the free encyclopedia

Linux is a family of open source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991 by Linus Torvalds. Linux is typically packaged in a Linux distribution (or distro for short).

Distributions include the Linux kernel and supporting system software and libraries, many of which are provided by the GNU Project. Many Linux distributions use the word "Linux" in their name, but the Free Software Foundation uses the name GNU/Linux to emphasize the importance of GNU software, causing some controversy.

Rules

Related Communities

Community icon by Alpár-Etele Méder, licensed under CC BY 3.0

founded 5 years ago
MODERATORS
 

Solution

Yeah, the drive is dying. As suggested by @[email protected], and @[email protected], I ran a S.M.A.R.T. test (the short option), and received the following report:

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
Failed Attributes:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   001   001   051    Pre-fail  Always   FAILING_NOW 1473

Original Post

I have a pulled hard drive from an old Western Digital external hard drive. I connected it to my desktop to see what was on it, and, after running fdisk -l (which took a weirdly long time to run while also keeping one core at 100%), it gave the error message:

The primary GPT table is corrupt, but the backup appears OK, so that will be used.

However, trying to mount it resulted in another error saying that the drive doesn't exist. Looking at dmesg reveals a ton of other errors like the following:

...
[  252.090206] critical target error, dev sde, sector 8 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[  252.090210] Buffer I/O error on dev sde, logical block 1, async page read
[  252.090292] sd 6:0:0:0: [sde] Attached SCSI disk
[  296.776697] sd 6:0:0:0: [sde] tag#13 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD IN 
[  296.776712] sd 6:0:0:0: [sde] tag#13 CDB: ATA command pass through(12)/Blank a1 08 2e 00 01 00 00 00 00 ec 00 00
[  296.796696] scsi host6: uas_eh_device_reset_handler start
[  296.920474] usb 4-6: reset SuperSpeed USB device number 3 using xhci_hcd
[  296.940278] scsi host6: uas_eh_device_reset_handler success
[  300.090562] sd 6:0:0:0: [sde] Unaligned partial completion (resid=12280, sector_sz=512)
[  300.090567] sd 6:0:0:0: [sde] tag#16 CDB: Read(10) 28 00 00 00 00 08 00 00 18 00
[  300.090570] sd 6:0:0:0: [sde] tag#16 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=2s
[  300.090572] sd 6:0:0:0: [sde] tag#16 Sense Key : Hardware Error [current] 
[  300.090573] sd 6:0:0:0: [sde] tag#16 Add. Sense: Internal target failure
[  300.090574] sd 6:0:0:0: [sde] tag#16 CDB: Read(10) 28 00 00 00 00 08 00 00 18 00
[  300.090575] critical target error, dev sde, sector 8 op 0x0:(READ) flags 0x80700 phys_seg 3 prio class 2
[  300.090640] sd 6:0:0:0: [sde] tag#14 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=2s
[  300.090642] sd 6:0:0:0: [sde] tag#14 Sense Key : Hardware Error [current] 
[  300.090643] sd 6:0:0:0: [sde] tag#14 Add. Sense: Internal target failure
[  300.090644] sd 6:0:0:0: [sde] tag#14 CDB: Read(10) 28 00 00 00 00 20 00 00 08 00
[  300.090645] critical target error, dev sde, sector 32 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2
[  326.010763] usb 4-6: USB disconnect, device number 3
[  326.010898] sd 6:0:0:0: [sde] tag#18 uas_zap_pending 0 uas-tag 1 inflight: CMD 
[  326.010901] sd 6:0:0:0: [sde] tag#18 CDB: Read(10) 28 00 00 00 00 20 00 00 08 00
[  326.010903] sd 6:0:0:0: [sde] tag#17 uas_zap_pending 0 uas-tag 2 inflight: CMD 
[  326.010905] sd 6:0:0:0: [sde] tag#17 CDB: Read(10) 28 00 00 00 00 08 00 00 08 00
[  326.010919] sd 6:0:0:0: [sde] tag#18 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=25s
[  326.010921] sd 6:0:0:0: [sde] tag#18 CDB: Read(10) 28 00 00 00 00 20 00 00 08 00
[  326.010922] I/O error, dev sde, sector 32 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[  326.010925] Buffer I/O error on dev sde, logical block 4, async page read
[  326.010931] sd 6:0:0:0: [sde] tag#17 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=25s
[  326.010942] sd 6:0:0:0: [sde] tag#17 CDB: Read(10) 28 00 00 00 00 08 00 00 08 00
[  326.010943] I/O error, dev sde, sector 8 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[  326.010945] Buffer I/O error on dev sde, logical block 1, async page read
[  326.050781] sd 6:0:0:0: [sde] Synchronizing SCSI cache
[  326.270781] sd 6:0:0:0: [sde] Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
...

Is this drive dead? Is something just corrupt? If there is data on it, would it be straightforward to pull it off?

top 12 comments
sorted by: hot top controversial new old
[–] [email protected] 9 points 2 years ago* (last edited 2 years ago) (1 children)

Honestly, if there is any data on the disk you cannot afford to lose then power it off and take it to a professional recovery service. The more you try to do to a failing drive the lower the chances of recovering data or the less data can be recovered from a drive. So, if you really care about the data just pay for someone that knows what they are doing and that has the tools to recover the data for you.

But... if you don't care or the data is not worth the cost of recovery. Then the first things I would run a S.M.A.R.T. report with smartctl and see what that says. This should give you an indication of if the drive if going bad or not. Though even if it passes it might still be going bad.

You can also use tools like ddrescue to create a backup of the drive copying as many blocks as it can even if some of them fail the first time (before it goes back to retry bad sectors). Though be warned, even reading from a failing drive can cause it to fail faster, seek a professional if you care about the data on it.

As for recovering data from it - if you are lucky you might just need to recover the partition table and ahve acess to things again. Though likely more than the partition table is bad on an old failing disk. You can use tools like photorec to recover files (more than just images that is) on the disk. Though with tools like this you lose all folder structure and there is no guarantee it will recover everything or even anything.

Then even if the S.M.A.R.T. report looks good you might want to run badblocks over the drive to try and detect any bad areas of the disk. A S.M.A.R.T. enable drive should be doing this passively in the background but it can be helpful to force a scan if you are having issues with the disk. You should really only do this after you have recovered any data from the disk you care about.

You can also find other tools and tips here.

Oh, and if the disk has signs of failing I would never trust it again to store anything you care about. Best to replace it with something new if you can. And remember, always backup your data - file recovery is never a good option and should only ever be used as a last ditch attempt.

[–] Kalcifer 3 points 2 years ago (1 children)

I ran a S.M.A.R.T short test, and, yeah, the hard drive is quickly dying:

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
Failed Attributes:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   001   001   051    Pre-fail  Always   FAILING_NOW 1473
[–] average650 2 points 2 years ago

Well.... That's bad news. Pretty clear though.

[–] [email protected] 9 points 2 years ago* (last edited 2 years ago) (2 children)

Best bet would be ddrescue to extract as much as possible out of it, then once that's done you can try mounting the backup image and extracting files out of it if possible.

That disk certainly isn't healthy. You have no idea how much life is in there left, so you want the most straightforward and quickest way to pull out the data. Don't attempt to mount it and copy individual files, image the whole thing with ddrescue.

If the data is really important, leave it unplugged and bring it to a specialist. The more the drive runs, the more physical damage could be happening, the less chances even a professional can recover it.

[–] Kalcifer 2 points 2 years ago (2 children)

That disk certainly isn’t healthy.

For my own future knowledge, what, exactly, in the logs, led you to that conclusion?

image the whole thing with ddrescue

Since you mention "image", I'm assuming that I would need a drive at least equal to the size of the source drive to store the image? The issue is that the source drive is 2TB in size, so I would need to source another 2TB drive (at least) to store the image.

[–] PriorProject 3 points 2 years ago* (last edited 2 years ago) (1 children)

That disk certainly isn’t healthy.

For my own future knowledge, what, exactly, in the logs, led you to that conclusion?

GPT is the partition scheme that stores the partition table. Very few pieces of software interact with that layer of your storage system. The first GPT table error tells us that, unless we've been messing with low-level tools that might break the partition table... the physical disk has probably already lost data. So we're already primed to suspect a busted disk.

Then the kernel log snippets you pasted show tons of errors in the block device layer. I know noisy application logs sometimes train us to ignore error messages, but the kernel block device layer does not log out error messages for fun. If you see any log like ERROR sdx where sdx is a block device that stores important data without a backup... you're about to be in for a rough ride.

image the whole thing with ddrescue

Since you mention "image", I'm assuming that I would need a drive at least equal to the size of the source drive to store the image? The issue is that the source drive is 2TB in size, so I would need to source another 2TB drive (at least) to store the image.

Yes, though you can pipe ddrescue into gzip or another compressor and if the drive isn't full and you're lucky enough to have some decent sized zero'd out regions they'll compress very well. In the best case, you might only need a disk big enough to hold the live data. In the worst case, yeah, you need a matched disk or bigger.

Pro tip, buy drives in pairs and automate backups to one of them. If you have a disk you can't copy to another disk, you almost might as well have no disk. This kind of thing happens, not a lot... but I lose a disk maybe every 3y-5y or so. I have a few disks around... maybe 6 online at any given time. But it's not like I'm running hundreds of them. They just conk out every now and again and you've got to be ready for them.

[–] [email protected] 2 points 2 years ago (1 children)

Pro tip, buy drives in pairs and automate backups to one of them.

Honestly I don't think this is the best way. Best to buy them at different times or buy two from different manufacturers. Chances are that if you buy two identical ones together once one starts to fail the other is not far behind. Or if there is some defect in the batch you could have both fail quickly and within a very short window of each other.

If you much buy identical drives best to have one be far less active then the other, like be an offline backup rather then a hot backup.

[–] PriorProject 1 points 2 years ago

Yeah fair. This is sound advice.

I buy matched pairs to mirror, and then offset the purchase of my pair of backup drives. So I end up having 4 copies on two different models. And when my primary disks get full I "promote" my larger backup disks to primary and buy a new/larger pair of backup disks that are big enough to store many snapshots of my primaries. I knew this was too much for OP and tried to simplify... but your approach is equally simple and better.

[–] [email protected] 2 points 2 years ago (1 children)

For my own future knowledge, what, exactly, in the logs, led you to that conclusion?

That kind of error message is never good news:

[ 300.090572] sd 6:0:0:0: [sde] tag#16 Sense Key : Hardware Error [current] [ 326.010925] Buffer I/O error on dev sde, logical block 4, async page read

I mean, technically it could also be the SATA controller/interface being bad, the USB errors might indicate that. But in all cases, it's struggling to read the drive, and that's never good and you should always assume the worst. Best case the drive is healthy and you extracted the data for nothing, but that's a good problem to have.

Since you mention “image”, I’m assuming that I would need a drive at least equal to the size of the source drive to store the image? The issue is that the source drive is 2TB in size, so I would need to source another 2TB drive (at least) to store the image.

Yeah, that's a bit of a problem. I mean nothing stopping you from trying to mount it. Make sure you mount it read-only, as it'll both protect the drive from potentially corrupting more data, and read-only filesystems are also more tolerant to errors whereas read-write errors will cause the filesystem to bail.

It really depends on how much you care about the data. If it's only nice to have but not critical to keep, you can afford more risky recovery operations.

You can use testdisk to try to locate the partitions on it, and depending on the filesystem you might be able to only copy the file data that's still good.

This might be good help as well: https://wiki.archlinux.org/title/file_recovery

[–] Kalcifer 1 points 2 years ago

For your reference, please see the updated post. I ran a S.M.A.R.T test, and the drive is indeed borked.

Thank you very much for all of the extra information!

[–] [email protected] 1 points 2 years ago

This is the way!!!!!

[–] [email protected] 2 points 2 years ago

I don't think the drive is totally dead, it is somehow reactive to commands, but I would not trust to use it.

You should be able to pull of at least some of the data, but there is no guarantee.

I would copy all what I can and then try to run a low level format and mark the bad blocks, then run the S.M.A.R.T. test to see if something change, but I would do it just out of curiosity.

load more comments
view more: next ›