• March 28, 2024, 01:56:28 AM
  • Welcome, Guest
Please login or register.

Login with username, password and session length
Advanced search  

News:

This Forum Beta is ONLY for registered owners of D-Link products in the USA for which we have created boards at this time.

Pages: 1 [2]

Author Topic: "RAID 5 (Degraded)" but drive now shows status as "Normal"  (Read 32975 times)

JAylmer

  • Level 1 Member
  • *
  • Posts: 10
Re: "RAID 5 (Degraded)" but drive now shows status as "Normal"
« Reply #15 on: June 29, 2012, 10:50:34 PM »

What I don't get with this thread is why the data is unavailable with a degraded Raid 5.

Let me guess:
First mistake: You have "auto rebuild" enabled. 

I am guessing again but here goes:
A drive failed, degrading the Raid.
Unfortunately you had auto-rebuild enabled so it started a regeneration.
You were able to access all of your data from the degraded array right up to the instant where the regeneration completed. 
All four drives now on line but now only some data or none available through the network and apparent corruption.

Second mistake: Following the advice of S.M.A.R.T. 
It gives a hint but it may be wrong in terms of indicating the faulty drive.

From that point if you had of removed the faulty drive (that regenerated back into the raid) then all of your data will probably magically reappear. 
I am not sure exactly why this happens.

Problem is you think you know the faulty drive but you may not:
Number the drives 1 to 4 top to bottom. 
Remove a drive and see what happens - is the data now there.
Replace drive then try the next one etc. 
Power off of course when drives moved. Each drive back in the same slot.

From my experience with the DNS-343 in a 4 drive Raid:
1/ Never have auto-rebuild on
2/ S.M.A.R.T.  is dumb - Use its conclusions as a hint only but never ever act on its advice.
3/ Never believe the suggested faulty drive. Prove it by physically removing the drive and verify data is still available (degraded of course).
4/ The DNS-343 does a great job of dropping out a drive when it is faulty but it can mislead the user into identifying which drive is faulty.
5/ When you do correctly identify the faulty drive avoid simply doing a format and adding it back into the array, put it in the trash can instead.

Notes: Above all when a raid is degraded its no big deal. It will continue to work until another drive fails. 
Do not play with the DNS-343 or the array until the data has been backed up.
If you lose data with a Raid 5 then don't blame the DNS-343.

Please don't interperate my comments in a personal way towards the original poster.  I only know this stuff because I made the same mistakes and a few times over before I caught on.

The above is based on my own experiences and may be flawed.  I am hoping the process can be further improved by the team.  Alway remember that a hard drive is simply a shovel full of refined dirt.
Logged

melvynadam

  • Level 2 Member
  • **
  • Posts: 53
Re: "RAID 5 (Degraded)" but drive now shows status as "Normal"
« Reply #16 on: June 30, 2012, 02:28:45 PM »

Let me guess:
First mistake: You have "auto rebuild" enabled. 

I did indeed have "auto rebuild" enabled. I went to replace the supposedly faulty drive tonight but, just before doing so, I read your post. So I've now turned off auto rebuild and will need to wait 20hrs for "syncing". Meanwhile, all four drives now say "Normal" and I have no "Degraded" message.

You stopped me at the 11th hour! Thank you.

From my experience with the DNS-343 in a 4 drive Raid:
1/ Never have auto-rebuild on
....

There should be a sticky with "Best practices for using the DNS-343" and those of you who are more knowledgable on the topic can share with the rest of us what we might do to reduce the chances of complications. Thanks for your notes.

I'm still curious about whether (assuming the HDD is actually faulty) I needed to buy the exact same model or whether any 1.5TB drive would have worked. Currently I have four Seagate Barracuda 7200.11 1.5TB drives. Since I was under the (possibly mistaken) impression that a drive had dies, I went out and bought this exact same model again even though it's now selling at a premium. Was that necessary?
Logged

JAylmer

  • Level 1 Member
  • *
  • Posts: 10
Re: "RAID 5 (Degraded)" but drive now shows status as "Normal"
« Reply #17 on: June 30, 2012, 04:41:41 PM »

I'm still curious about whether (assuming the HDD is actually faulty) I needed to buy the exact same model or whether any 1.5TB drive would have worked. Currently I have four Seagate Barracuda 7200.11 1.5TB drives. Since I was under the (possibly mistaken) impression that a drive had dies, I went out and bought this exact same model again even though it's now selling at a premium. Was that necessary?
There are two issues that may impact:
1/ You know your Seagates are compatible with the DNS-343 in Raid 5.  The compatible list of drives includes only a couple that are currently available.  A couple of years ago, almost about any drive would work in the DNS-343 but not so now. Are you feeling lucky?
2/ Perhaps the array is happier if all drives transfer speed is the same.  (However I know people have mixed drives without issues.)

Also when a desktop drive has a read error it will exhaustively try to recover the data.  It will then relocate the data and bad spot the damaged area.  This takes time.
When a drive made for Raid use has an error it will retry but not for long as the Raid controller can efficiently calculate the missing information and log the hiccup.
Desktop drives probably should not be used in Raid arrays as their excessive retry time and methods can result in the Raid controller dropping them out.

Your Seagate drive I assume is a desktop drive.  Until it has a read error it will work the same as a Raid drive.  When it does have a read error it tries to recover but the Raid controller has already calculated the missing information, using data from the other drives, and moved onto the next request.  The drive does not know to stop retrying and has busied itself, the controller drops it out, reports the issue and you buy a new drive.
Could the drive be reused, perhaps and perhaps not.  The DNS-343 assumes it has good drives when you hit the regen button, you must guarantee this, if they are not then it won't be successful and data may be lost.

They problem is you don't know the state of the rejected drive, just because it formats OK does not means it should be reused.  Better to put it in a USB caddy and use it as a backup drive and get a new identical drive for the Raid array.
Logged

JavaLawyer

  • BETA Tester
  • Level 15 Member
  • *
  • Posts: 12190
  • D-Link Global Forum Moderator
    • FoundFootageCritic
Re: "RAID 5 (Degraded)" but drive now shows status as "Normal"
« Reply #18 on: July 01, 2012, 08:50:32 AM »

There should be a sticky with "Best practices for using the DNS-343" and those of you who are more knowledgable on the topic can share with the rest of us what we might do to reduce the chances of complications. Thanks for your notes.

I have been assembling an FAQ board for that very purpose. One critical piece of information that board lacks (as this thread aptly highlights) is best practices particularly related to RAID-5 handing, maintenance, and troubleshooting.

If anyone would care to contribute content for a detailed post to that end, I will gladly pull all of the seed material together into a comprehensive sticky post. JAylmer seems quite adept with RAID-5. . . . No hinting implied  ;)
Logged
Find answers here: D-Link ShareCenter FAQ I D-Link Network Camera FAQ
There's no such thing as too many backups FFC

melvynadam

  • Level 2 Member
  • **
  • Posts: 53
Re: "RAID 5 (Degraded)" but drive now shows status as "Normal"
« Reply #19 on: July 02, 2012, 01:33:00 AM »

I will gladly pull all of the seed material together into a comprehensive sticky post. JAylmer seems quite adept with RAID-5. . . . No hinting implied  ;)

 ;D

That's a great help to those of us with less knowledge and very much appreciated. Thank you.

This scary episode has now (hopefully) ended for me. After following JAylmer's advice, I disabled auto-rebuild and the NAS then took 20hrs to "sync". Following which everything is back to normal. I had put considerable effort and money into getting a new HDD that was exactly the same model and having a friend bring it over to my country (it's not available here). I don't consider that effort wasted because I now have the peace of mind that I have a "spare" drive should the need arise one day.
Logged

JavaLawyer

  • BETA Tester
  • Level 15 Member
  • *
  • Posts: 12190
  • D-Link Global Forum Moderator
    • FoundFootageCritic
Re: "RAID 5 (Degraded)" but drive now shows status as "Normal"
« Reply #20 on: July 02, 2012, 04:53:50 AM »

;D

That's a great help to those of us with less knowledge and very much appreciated. Thank you.

I'm only too glad to help, but will need someone with enough RAID-5 experience (on the DNS-343) to cobble together draft material for me to tailor.
Logged
Find answers here: D-Link ShareCenter FAQ I D-Link Network Camera FAQ
There's no such thing as too many backups FFC

JAylmer

  • Level 1 Member
  • *
  • Posts: 10
Re: "RAID 5 (Degraded)" but drive now shows status as "Normal"
« Reply #21 on: July 02, 2012, 10:35:24 AM »

Happy to contribute to the FAQ, but gaps in my knowledge also. Please email.
Logged

JavaLawyer

  • BETA Tester
  • Level 15 Member
  • *
  • Posts: 12190
  • D-Link Global Forum Moderator
    • FoundFootageCritic
Re: "RAID 5 (Degraded)" but drive now shows status as "Normal"
« Reply #22 on: July 02, 2012, 10:40:35 AM »

Happy to contribute to the FAQ, but gaps in my knowledge also. Please email.

Thank you! Sent you a PM  :D
Logged
Find answers here: D-Link ShareCenter FAQ I D-Link Network Camera FAQ
There's no such thing as too many backups FFC

melvynadam

  • Level 2 Member
  • **
  • Posts: 53
Re: "RAID 5 (Degraded)" but drive now shows status as "Normal"
« Reply #23 on: July 08, 2012, 12:38:11 AM »

Unfortunately I can't tag this thread as "Closed" or "Solved" because my saga isn't over yet.

For the last three days I've received the same very polite but still saddening email from my DNS-343: "A SMART test was performed ... The result of the test is: Fail".

It's the same HDD that was reported as failing at the beginning of this thread. Of course, since I opened this thread I have disabled auto-rebuild which restored full access to my data but I'm worried about this faulty drive.

Regular readers will know I've already purchased a replacement drive so should I just switch out the drive and rebuild or is there some other procedure I should perform first?
Logged

JAylmer

  • Level 1 Member
  • *
  • Posts: 10
Re: "RAID 5 (Degraded)" but drive now shows status as "Normal"
« Reply #24 on: July 08, 2012, 11:20:58 AM »

S.M.A.R.T. is a collection of hardware counters that live in a disk drive.
Example: Let's say a drive is asked to seek to cylinder 23, when it arrives it does a read to verify where it stopped and it reports cylinder 24.  The drive knows it made a mistake and increments the "seek error" hardware counter. It then retrys the seek, success.  When this "seek error counter" reaches a threshold (eg.200 seek errors) then SMART will send you an email each time you have scheduled the counter values to be scanned and reported on.  In the next 12 months let's say the drive has no further errors but the SMART test will keep sending those emails (drive failed) because it does not do any testing each time it is just looks at the counter values and reports an error because one of them is exceeding a threshold.

A classic: Install an incompatible drive into a DNS-343 array, the array will fail. Using the DNS-343 admin panels, the SMART counters are now extremely high.
Is the drive faulty = No.  Can the counters ever be reset = No. 
Has the drive been damaged = No. Will it now always report drive has failed = Yes.
Will the drive supplier accept this drive back = Not if they know.
The smarter S.M.A.R.T. apps snap the values each test and look for what counters have incremented lately.
In your case you had some errors - are you still having them now?  To find out go to the Admin panels and look at the Log which is nothing to do with SMART testing.  I am not suggesting that the SMART counters are useless, just nearly useless, you could snap these values periodically and manually check is any counters are getting significantly worse.

One thing to remember that hard drives typically fail because of issues between the head and the platter surface, and then from the resulting and growing contamination.  Recent stats indicate that if a drive gets a read error where it has to relocate the data then that drive is 47 times more likely to fail within the next few weeks.  Sometimes you can get the data off the drive in time but sometimes not.

My understanding when you are unsure of the true status of your drives:
1/ Has the raid array degraded? - critical
2/ Are errors being logged into the log file? - critical
If all is OK to here then you can almost relax.

If you have a lot of free time:
3/ Are the SMART counter values worsening
(I don't recall whether the counters +1 or -1 to record a new error)

The fact that you are receiving the email advising you of errors may or may not be significant:
"A SMART test was performed ... The result of the test is: Fail".
I have had the same annoying thing happen also and I know how it makes you feel.
Also you have redundancy with raid 5 and you make backups.

Another curious thing to watch for is that when an error occurs in a raid 5 array, sometimes all of the drives have their smart counter value incrementated rather than only the drive with the error.

Hope this helps, I know its not definitive.
Logged

JAylmer

  • Level 1 Member
  • *
  • Posts: 10
Re: "RAID 5 (Degraded)" but drive now shows status as "Normal"
« Reply #25 on: July 09, 2012, 10:26:28 PM »

The other option you have is to remove the drives and test scan each of them in a USB caddy using manufacturers diagnostics, starting with the one that is reporting SMART quick scan failures.  Let's assume it is actually faulty and fails in the different environment also.  You replace it, rebuild the array, probably without the need to initialise the array from scratch, so no data loss.  I think I would at least scan the suspect drive then you would know.
 
I would backup first if required.
Remove the suspect drive, check that I can still access all of my files from the now degraded array.
Make sure that mp3's load, Word docs open etc.
Insert your spare drive and rebuild the array.
In a USB caddy, exhaustively test the suspect drive with full scans, zero writes etc.
Logged
Pages: 1 [2]