D-Link Forums
The Graveyard - Products No Longer Supported => D-Link Storage => DNS-323 => Topic started by: jamieburchell on August 17, 2010, 09:06:41 AM
-
Should I be worried about the Raw_Read_Error_Rate, Seek_Error_Rate and Hardware_ECC_Recovered values constantly increasing? If I'm reading the table correctly, no reallocated sectors. No spin up time either? I can't find anywhere that explains this table properly. The Wiki entry is sketchy at best.
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 117 099 006 Pre-fail Always - 147705572
3 Spin_Up_Time 0x0003 095 089 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 098 098 020 Old_age Always - 2628
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 072 060 030 Pre-fail Always - 16312004
9 Power_On_Hours 0x0032 083 083 000 Old_age Always - 14988
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 2
12 Power_Cycle_Count 0x0032 100 037 020 Old_age Always - 90
184 Unknown_Attribute 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 058 057 045 Old_age Always - 42 (Lifetime Min/Max 29/43)
194 Temperature_Celsius 0x0022 042 043 000 Old_age Always - 42 (0 8 0 0)
195 Hardware_ECC_Recovered 0x001a 045 035 000 Old_age Always - 147705572
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
-
looks fine. about the right ratio for size of drive compared to what i normally see
-
ID#
5 Reallocated_Sector_Ct
197 Current_Pending_Sector
If these two are anything other than 0, it indicate bad media, and you should back-up and send in for RMA.
If you zero the drive (will force the drive to remap bad sectors), and test the drive again, you will likely see the bad sector go away, if the drive electronics remap the bad sector. However, after power-recycling, depending on your luck and the nature of surface damage, the bad sectors may re-appear later.
For Tera-byte size drive, once you see bad sectors, most likely you will see a lot of bad sectors appearing very quickly.
-
Thanks guys. Why do the Raw_Read_Error_Rate, Seek_Error_Rate and Hardware_ECC_Recovered values keep increasing at an alarming (to me) rate?
-
That's just standard fodder for any drive - it sometimes won't pick up a sector correctly first read and needs a second go at it. If it takes too many goes then it gest marked bad. All drives have high numbers for these, it's just part of the nature of magnetic media.
I found an article you might be interested in that is about drive failures and the reasons for it: http://www.tomshardware.co.uk/hdd-reliability-storelab,review-31968.html (http://www.tomshardware.co.uk/hdd-reliability-storelab,review-31968.html)
-
What exactly is the "alarming rate" they are increasing at over a specific period of time?
-
What exactly is the "alarming rate" they are increasing at over a specific period of time?
I ran smartctl a few times consecutively, it was a different value each time - maybe 100 more than the last. I think I was streaming a single MP3 from the disk at the time. By virtue of the fact that it's called "Raw_Read_Error_Rate" and not "nothing_to_worry_about_rate" made me concerned :)
-
Actually, I'm being stupid - it has happened once before :)
It's a rate rather than a cumulative value - so of course it will fluctuate during disk usage. Here's some more info:
8:44
1 Raw_Read_Error_Rate 0x000f 117 099 006 Pre-fail Always - 154964518
7 Seek_Error_Rate 0x000f 072 060 030 Pre-fail Always - 16366525
9:27
1 Raw_Read_Error_Rate 0x000f 105 099 006 Pre-fail Always - 7674486
7 Seek_Error_Rate 0x000f 072 060 030 Pre-fail Always - 16456073
12:27
1 Raw_Read_Error_Rate 0x000f 108 099 006 Pre-fail Always - 15135422
7 Seek_Error_Rate 0x000f 072 060 030 Pre-fail Always - 16475086
12:30
1 Raw_Read_Error_Rate 0x000f 108 099 006 Pre-fail Always - 15310680
7 Seek_Error_Rate 0x000f 072 060 030 Pre-fail Always - 16475575
-
Where do you find smartctl for this box? I'm assuming you're running from SSD or telnet, right?
-
It's one of the FFP packages (smartmontools-5.39.1-1.tgz).
I asked Fonz if he would update the version yesterday - and he did! I was hoping I'd be able to utilise the JMicron USB enclosure support in the new version to test externally connected drives. It didn't work for my enclosue though.
-
I loaded the earlier version, decided to try Google. :D
You do seem to get bigger numbers and more of them for your drives, not sure what that means. :)
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 100 100 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0007 062 062 011 Pre-fail Always - 12210
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 485
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 253 253 051 Pre-fail Always - 0
8 Seek_Time_Performance 0x0025 100 100 015 Pre-fail Offline - 0
9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 5343
10 Spin_Retry_Count 0x0033 100 100 051 Pre-fail Always - 0
11 Calibration_Retry_Count 0x0012 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 21
13 Read_Soft_Error_Rate 0x000e 100 100 000 Old_age Always - 0
183 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
184 Unknown_Attribute 0x0033 100 100 000 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 064 064 000 Old_age Always - 36 (Lifetime Min/Max 21/36)
194 Temperature_Celsius 0x0022 067 067 000 Old_age Always - 33 (Lifetime Min/Max 21/40)
195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always - 100557941
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 253 253 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x000a 100 100 000 Old_age Always - 0
201 Soft_Read_Error_Rate 0x000a 100 100 000 Old_age Always - 0
SMART Error Log Version: 1
No Errors Logged
-
Was you accessing the disk at the time? Yours are zero! My read rate also seems to match the ECC recovered value, unlike yours.
-
Here's a side question for the hard drive gurus in here. I read that after time hard drive data "fades" due to the nature of magnetism. There's talk about "refreshing" a drive periodically which involves reading the entire drive. Would a periodic rsync each month satisfy the criteria or will I find one day that all my backup drives are empty ::)
-
I think the time required for data to fade is measured in decades. I've had old drives from MS-DOS computers in the closet from the early 90's, and I've stuck them in a USB enclosure and read them error-free. You'll be more likely to lose data from optical media, some of which has trouble remembering more than a couple of years, than a hard disk.
-
I guess it depends which articles you read...
E.g. this article suggests as early as a year
http://www.larryjordan.biz/articles/lj_restore_hard_disk_data.html (http://www.larryjordan.biz/articles/lj_restore_hard_disk_data.html)
-
Here it is madly copying a couple gigs of data.
root@DNS-323:/# smartctl -a -i -d marvell /dev/sda
smartctl version 5.38 [arm-unknown-linux-uclibc] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF INFORMATION SECTION ===
Device Model: SAMSUNG HD154UI
Serial Number: S1Y6J1LS719234
Firmware Version: 1AG01118
User Capacity: 1,500,301,910,016 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 3b
Local Time is: Thu Aug 19 16:43:56 2010 DST
==> WARNING: May need -F samsung or -F samsung2 enabled; see manual for details.
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (19043) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 255) minutes.
Conveyance self-test routine
recommended polling time: ( 33) minutes.
SCT capabilities: (0x003f) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 100 100 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0007 062 062 011 Pre-fail Always - 12210
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 486
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 253 253 051 Pre-fail Always - 0
8 Seek_Time_Performance 0x0025 100 100 015 Pre-fail Offline - 0
9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 5344
10 Spin_Retry_Count 0x0033 100 100 051 Pre-fail Always - 0
11 Calibration_Retry_Count 0x0012 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 21
13 Read_Soft_Error_Rate 0x000e 100 100 000 Old_age Always - 0
183 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
184 Unknown_Attribute 0x0033 100 100 000 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 065 064 000 Old_age Always - 35 (Lifetime Min/Max 21/36)
194 Temperature_Celsius 0x0022 065 065 000 Old_age Always - 35 (Lifetime Min/Max 21/40)
195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always - 102094588
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 253 253 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x000a 100 100 000 Old_age Always - 0
201 Soft_Read_Error_Rate 0x000a 100 100 000 Old_age Always - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
-
And another one shortly after. Looks like Hardware ECC recovered counts up madly, none of the others change much.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 100 100 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0007 062 062 011 Pre-fail Always - 12210
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 486
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 253 253 051 Pre-fail Always - 0
8 Seek_Time_Performance 0x0025 100 100 015 Pre-fail Offline - 0
9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 5344
10 Spin_Retry_Count 0x0033 100 100 051 Pre-fail Always - 0
11 Calibration_Retry_Count 0x0012 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 21
13 Read_Soft_Error_Rate 0x000e 100 100 000 Old_age Always - 0
183 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
184 Unknown_Attribute 0x0033 100 100 000 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 065 064 000 Old_age Always - 35 (Lifetime Min/Max 21/36)
194 Temperature_Celsius 0x0022 065 065 000 Old_age Always - 35 (Lifetime Min/Max 21/40)
195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always - 102645802
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 253 253 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x000a 100 100 000 Old_age Always - 0
201 Soft_Read_Error_Rate 0x000a 100 100 000 Old_age Always - 0
SMART Error Log Version: 1
No Errors Logged
-
I guess it depends which articles you read...
E.g. this article suggests as early as a year
http://www.larryjordan.biz/articles/lj_restore_hard_disk_data.html (http://www.larryjordan.biz/articles/lj_restore_hard_disk_data.html)
Well, I can say my experience with a number of drives is different. I've had a few of them that are years old and powered up and ran fine.
I recently dug out an old system that has Windows 98 on it because it has a 12 channel ISA video board that I was fixing a program for, booted it up and fixed the program without a hitch. It hadn't been powered on for 4-5 years at least. It was buried in the closet, I had to take a whole pile of stuff out to get to it!
-
Thanks for posting your results. Sadly I'm none the wiser. Your read error rate is zero. Your ECC value doesn't match the read error rate like mine. My spin-up time is zero and yours isn't.
In contrast, my new Samsung 2TB drive showed the error rate as 0 for first day or so and now is "1". It also shows a spin-up time value. ECC value is 0.
So I really have no idea what these figures actually mean.
-
I'm just looking at my data, and I think I need to use a different control to actually run a test.
How are you invoking the test?
-
Well, I can say my experience with a number of drives is different.
I'd tend to agree. I was wondering at a basic level if running a rsync across the drive "refreshed" the data. I'm just paranoid.
-
I'm just looking at my data, and I think I need to use a different control to actually run a test.
How are you invoking the test?
To get the print out? Same as you did: smartctl -a -i -d marvell /dev/sda
-
OK, that's all I'm doing. I was thinking I needed to run the SMART test, but in looking closer, I see that all I should have to do is to print the accumulated SMART data.
Don't know what to tell you, maybe it's time to go drive shopping! :D
-
I'd tend to agree. I was wondering at a basic level if running a rsync across the drive "refreshed" the data. I'm just paranoid.
An rsync doesn't take nearly enough time to refresh all the data on a drive! :o
-
You can run the various SMART tests (short/long etc) if you like. That's all tools like Seatools actually do.
-
An rsync doesn't take nearly enough time to refresh all the data on a drive! :o
Presumably, the data must be read from the drive in order to compare it with the destination? That article suggests that hard drives automatically refresh the data when reading it - unless I mis-read.
All hard drives are programmed to refresh data as the heads skim along the drive
-
Presumably, the data must be read from the drive in order to compare it with the destination? That article suggests that hard drives automatically refresh the data when reading it - unless I mis-read.
IMO, that's pure BS! I'm beginning to doubt what these guys are saying overall!
-
Could it mean that the platters are remagnetised by the heads or something. I have no idea :)
-
The only way to "refresh" data is to do a write operation. This is basic magnetic physics. In point of fact, the continual passes of the head, if they did anything, would tend to erode the data.
All that I read about long term storage of hard disks is you're more likely to have a problem with the drive spinning up again due to dry bearings or the like.
If the data is REALLY important, there are archive quality DVD-R/+R media that is supposed to last a long time. Save the data on a hard disk and on the DVD, and compare them every year to verify both copies are still intact.
-
I'll just format the backup drives every now and then when my OCD kicks in ;D
-
Good plan. ;)
-
As for the Seagate drive it's anyones guess if it needs replacing!
-
Hard drives just don't last like they used to in the 5.25" full size drive days. :D
-
If it was you and you were in my situation with three other backup hard drives would you replace the drive "just in case" or ignore the problem and wait for it to die :)
-
I think I'd replace the drive and then run the full Seagate surface analysis on it and their diagnostic. I'd want an independent opinion of the health of the drive.
One handy utility for SMART analysis is SpeedFan from http://www.almico.com/speedfan.php (http://www.almico.com/speedfan.php), it has a SMART page that links to a site and presents "plain English" readouts of the SMART data. It runs on 2K/XP/Vista/Win7.
-
I think I'd replace the drive and then run the full Seagate surface analysis on it and their diagnostic. I'd want an independent opinion of the health of the drive.
One handy utility for SMART analysis is SpeedFan from http://www.almico.com/speedfan.php (http://www.almico.com/speedfan.php), it has a SMART page that links to a site and presents "plain English" readouts of the SMART data. It runs on 2K/XP/Vista/Win7.
OK. I'll look in the sofa for loose change!
AFAIK the "full Seagate surface analysis" is just an interface to running the long SMART test.
-
Something that just occurred to me: You could run Spinrite on it. It's a program from Gibson Research which is about the best for drive recovery but also checks a drives health at low level and using the smart parameters.
-
The full surface analysis is more than the SMART test. You can do the write test, which of course erases the data, and it will reallocate any bad sectors as well.
-
New drive on order :)
In case anyone is interested, and living in the UK - since I don't have a PC anymore and some USB enclosures don't allow SMART to function properly on externally connected SATA drives - you might be interested in this - which does work perfectly with smartcrl (at least the Windows build I'm using). It uses a JMicron chipset, but I didn't need to tell smartctl anything.
http://www.usbnow.co.uk/Hard_Drive_Enclosures-IDE_&_SATA_Cable_Kits/c10_70/p52/USB_2.0_IDE_&_SATA_Cable_(with_Power_Supply)/product_info.html (http://www.usbnow.co.uk/Hard_Drive_Enclosures-IDE_&_SATA_Cable_Kits/c10_70/p52/USB_2.0_IDE_&_SATA_Cable_(with_Power_Supply)/product_info.html)