D-Link Forums

D-Link Network Storage => DNS-343 => Topic started by: melvynadam on June 23, 2012, 02:10:07 PM

Title: "RAID 5 (Degraded)" but drive now shows status as "Normal"
Post by: melvynadam on June 23, 2012, 02:10:07 PM
I have four Seagate 1.5TB ST31500341AS drives configured with RAID5. The box has been running flawlessly since July 2011. I currently have f/w rev1.03 and "Used Space:  3781079 MB", "Unused Space:  642025 MB".

Suddenly on Wednesday night while streaming a show from the NAS via a media streamer, performance became erratic and I checked the admin pages to see the fourth drive listed as "Abnormal" and the "Sync Time Remaining" defined as "Degraded". I tried to shutdown but the system wouldn't shutdown properly so I pulled the power.

I came back to it tonight and plugged it back in. It still shows "Sync Time Remaining: Degraded" but now the drive shows as "Normal".

I've ordered a new ST31500341AS in case the problem really is the drive but I have some questions that I thought the experts here migh be able to help answer:


Thanks so much for taking the time to read this far.
Title: Re: "RAID 5 (Degraded)" but drive now shows status as "Normal"
Post by: melvynadam on June 27, 2012, 11:01:46 PM
50+ views but no replies. I know this is an enthusiasts' forum and nobody owes me anything but can anyone provide any insight at all on my situation?

My new drive arrives tonight. When I receive it, do I simply swap out the old for the new and click "Re-configure All Existing Hard Drive(s)"?
Title: Re: "RAID 5 (Degraded)" but drive now shows status as "Normal"
Post by: melvynadam on June 27, 2012, 11:46:47 PM
Some more info:

This thread (http://forums.dlink.com/index.php?topic=40429.0) implies I should remove the drives, update my firmware, replace the drives, and pray. Is that a good plan?
Title: Re: "RAID 5 (Degraded)" but drive now shows status as "Normal"
Post by: JavaLawyer on June 28, 2012, 05:35:47 AM
If you haven't already done so, you should create (and maintain) a full backup of your data.  RAID-5 (or any form of RAID for that matter) is not considered a backup, but only provides redundancy: DNS-343 - Data Backup vs. Redundancy (http://forums.dlink.com/index.php?topic=46512.0)
Title: Re: "RAID 5 (Degraded)" but drive now shows status as "Normal"
Post by: melvynadam on June 28, 2012, 05:49:51 AM
Thanks but I can't access the data. Besides, I do have a full backup of the mission critical material.
Title: Re: "RAID 5 (Degraded)" but drive now shows status as "Normal"
Post by: melvynadam on June 28, 2012, 05:54:20 AM
As far as I can tell, there are three tasks I should do:

1. Replace the supposedly faulty hard drive
2. Rebuild the RAID5 array
3. Update the firmware to 1.06

I think the order should be as listed above but maybe it should be 3,1,2. Anyone?
Title: Re: "RAID 5 (Degraded)" but drive now shows status as "Normal"
Post by: JavaLawyer on June 28, 2012, 06:00:31 AM
I'm glad that you have a backup of your critical data. Moving forward, if you are looking at the DNS-343 as a single-source for your original and backup data, you should consider using a non-RAID solution. This advice applies to any brand NAS.

Page 25 of the following updated DNS-343 manual (for FW 1.04) describes the process for auto-rebuilding RAID for a failed HDD: DNS-343 Manual WW (http://pmdap.dlink.com.tw/PMD/GetAgileFile?itemNumber=MAL1000225&fileName=DNS-343_B1_Manual_v1.60(WW).pdf&fileSize=7355557.0;).

The FW update history does not indicate any changes between v1.03 and v1.04 that impacted/improved RAID handling, nonetheless, I can't definitively state whether updating the FW will help, hinder, or have no impact on this process.
Title: Re: "RAID 5 (Degraded)" but drive now shows status as "Normal"
Post by: JavaLawyer on June 28, 2012, 06:09:59 AM
For your personal edification, here are the cumulative FW release notes: DNS-343 - Cumulative Firmware Release Notes (http://forums.dlink.com/index.php?topic=48344.0)
Title: Re: "RAID 5 (Degraded)" but drive now shows status as "Normal"
Post by: melvynadam on June 28, 2012, 06:23:51 AM
Moving forward, if you are looking at the DNS-343 as a single-source for your original and backup data, you should consider using a non-RAID solution. This advice applies to any brand NAS.

Thanks. Not sure why that's advisable. Surely, the redundancy inherrent in RAID5 provides some peace of mind so that, in the event of a single drive dying (as mine apparently might have done) you are able to rebuild and carry on as if nothing has happened? BTW, the GUI actually makes a counter claim to the one you're asserting. It says "RAID 1 (Mirroring - Keeps Data Safe)".

Also, I had just reviewed the cumulative release notes and was reminded of a question I asked about 1.04 which was never answered (http://forums.dlink.com/index.php?topic=38418.msg134652#msg134652). Can you shed any light?
Title: Re: "RAID 5 (Degraded)" but drive now shows status as "Normal"
Post by: JavaLawyer on June 28, 2012, 07:03:17 AM
Thanks. Not sure why that's advisable. Surely, the redundancy inherrent in RAID5 provides some peace of mind so that, in the event of a single drive dying (as mine apparently might have done) you are able to rebuild and carry on as if nothing has happened? BTW, the GUI actually makes a counter claim to the one you're asserting. It says "RAID 1 (Mirroring - Keeps Data Safe)".

Not entirely true:
Title: Re: "RAID 5 (Degraded)" but drive now shows status as "Normal"
Post by: JavaLawyer on June 28, 2012, 07:27:25 AM
I had just reviewed the cumulative release notes and was reminded of a question I asked about 1.04 which was never answered (http://forums.dlink.com/index.php?topic=38418.msg134652#msg134652). Can you shed any light?

Please provide more information on the ramifications of this feature:

  • UPnP AV Server updated and DLNA 1.5 Certified

My only real gripe with the DNS-343 is that I often have to perform manual refreshes of the UPnP server to "see" new items from my media streamers. Will this firmware resolve this issue for me?

I haven't personally used the updated UPnP AV Server, and am therefore unable to comment on the improvements other than what's already stated in the release notes. DLNA certification is simply a nod from the Digital Living Network Alliance acknowledging D-Link devices as DLNA compliant.
Title: Re: "RAID 5 (Degraded)" but drive now shows status as "Normal"
Post by: melvynadam on June 28, 2012, 08:06:49 AM
Not true.

Well firstly you're not arguing with me - it's a claim made in the GUI (and the screenshot appears in the manual you pointed me towards).
I agree that RAID is no substitute for backup. A large chunk of my NAS is backed up to the cloud with an automated process (http://support.crashplan.com/doku.php/recipe/back_up_windows_mapped_drives) and I don't use RAID with backup in mind.
All of your points are correct and valid. Nonetheless, if you have four 1.5TB drives and 3TB of data to store on them, I'd argue you're better off with a 4.5TB RAID5 array than single volumes. As long as you're backing up too, in the event that a drive dies, in theory you're going to make a quicker recovery by replacing the drive and rebuilding than by replacing the drive and restoring from your remote location. I'll let you know if the theory holds up when I get my new drive tonight :)
Title: Re: "RAID 5 (Degraded)" but drive now shows status as "Normal"
Post by: JavaLawyer on June 28, 2012, 08:16:26 AM
Well firstly you're not arguing with me - it's a claim made in the GUI (and the screenshot appears in the manual you pointed me towards).
I agree that RAID is no substitute for backup. A large chunk of my NAS is backed up to the cloud with an automated process (http://support.crashplan.com/doku.php/recipe/back_up_windows_mapped_drives) and I don't use RAID with backup in mind.
All of your points are correct and valid. Nonetheless, if you have four 1.5TB drives and 3TB of data to store on them, I'd argue you're better off with a 4.5TB RAID5 array than single volumes. As long as you're backing up too, in the event that a drive dies, in theory you're going to make a quicker recovery by replacing the drive and rebuilding than by replacing the drive and restoring from your remote location. I'll let you know if the theory holds up when I get my new drive tonight :)

As long as there is an external backup, I wholly agree with you. RAID is a means for ensuring data availability and minimizing downtime. Many users confuse RAID and backups, so I use every available opportunity to get on my soapbox.  ;)

Good luck with your recovery.
Title: Re: "RAID 5 (Degraded)" but drive now shows status as "Normal"
Post by: melvynadam on June 28, 2012, 08:25:25 AM
Good luck with your recovery.
Thanks. Do you have a recommendation between these two options?:
Title: Re: "RAID 5 (Degraded)" but drive now shows status as "Normal"
Post by: JavaLawyer on June 28, 2012, 09:43:58 AM
Thanks. Do you have a recommendation between these two options?:
  • Replace drive, Rebuild array, Upgrade firmware
  • Upgrade firmware, Replace drive, Rebuild array

As I said earlier, "The FW update history does not indicate any changes between v1.03 and v1.04 that impacted/improved RAID handling, nonetheless, I can't definitively state whether updating the FW will help, hinder, or have no impact on this process." Unfortunately, given my uncertainty I'm not in a position to recommend which course of action is best suited to your situation.
Title: Re: "RAID 5 (Degraded)" but drive now shows status as "Normal"
Post by: JAylmer on June 29, 2012, 10:50:34 PM
What I don't get with this thread is why the data is unavailable with a degraded Raid 5.

Let me guess:
First mistake: You have "auto rebuild" enabled. 

I am guessing again but here goes:
A drive failed, degrading the Raid.
Unfortunately you had auto-rebuild enabled so it started a regeneration.
You were able to access all of your data from the degraded array right up to the instant where the regeneration completed. 
All four drives now on line but now only some data or none available through the network and apparent corruption.

Second mistake: Following the advice of S.M.A.R.T. 
It gives a hint but it may be wrong in terms of indicating the faulty drive.

From that point if you had of removed the faulty drive (that regenerated back into the raid) then all of your data will probably magically reappear. 
I am not sure exactly why this happens.

Problem is you think you know the faulty drive but you may not:
Number the drives 1 to 4 top to bottom. 
Remove a drive and see what happens - is the data now there.
Replace drive then try the next one etc. 
Power off of course when drives moved. Each drive back in the same slot.

From my experience with the DNS-343 in a 4 drive Raid:
1/ Never have auto-rebuild on
2/ S.M.A.R.T.  is dumb - Use its conclusions as a hint only but never ever act on its advice.
3/ Never believe the suggested faulty drive. Prove it by physically removing the drive and verify data is still available (degraded of course).
4/ The DNS-343 does a great job of dropping out a drive when it is faulty but it can mislead the user into identifying which drive is faulty.
5/ When you do correctly identify the faulty drive avoid simply doing a format and adding it back into the array, put it in the trash can instead.

Notes: Above all when a raid is degraded its no big deal. It will continue to work until another drive fails. 
Do not play with the DNS-343 or the array until the data has been backed up.
If you lose data with a Raid 5 then don't blame the DNS-343.

Please don't interperate my comments in a personal way towards the original poster.  I only know this stuff because I made the same mistakes and a few times over before I caught on.

The above is based on my own experiences and may be flawed.  I am hoping the process can be further improved by the team.  Alway remember that a hard drive is simply a shovel full of refined dirt.
Title: Re: "RAID 5 (Degraded)" but drive now shows status as "Normal"
Post by: melvynadam on June 30, 2012, 02:28:45 PM
Let me guess:
First mistake: You have "auto rebuild" enabled. 

I did indeed have "auto rebuild" enabled. I went to replace the supposedly faulty drive tonight but, just before doing so, I read your post. So I've now turned off auto rebuild and will need to wait 20hrs for "syncing". Meanwhile, all four drives now say "Normal" and I have no "Degraded" message.

You stopped me at the 11th hour! Thank you.

From my experience with the DNS-343 in a 4 drive Raid:
1/ Never have auto-rebuild on
....

There should be a sticky with "Best practices for using the DNS-343" and those of you who are more knowledgable on the topic can share with the rest of us what we might do to reduce the chances of complications. Thanks for your notes.

I'm still curious about whether (assuming the HDD is actually faulty) I needed to buy the exact same model or whether any 1.5TB drive would have worked. Currently I have four Seagate Barracuda 7200.11 1.5TB drives. Since I was under the (possibly mistaken) impression that a drive had dies, I went out and bought this exact same model again even though it's now selling at a premium. Was that necessary?
Title: Re: "RAID 5 (Degraded)" but drive now shows status as "Normal"
Post by: JAylmer on June 30, 2012, 04:41:41 PM
I'm still curious about whether (assuming the HDD is actually faulty) I needed to buy the exact same model or whether any 1.5TB drive would have worked. Currently I have four Seagate Barracuda 7200.11 1.5TB drives. Since I was under the (possibly mistaken) impression that a drive had dies, I went out and bought this exact same model again even though it's now selling at a premium. Was that necessary?
There are two issues that may impact:
1/ You know your Seagates are compatible with the DNS-343 in Raid 5.  The compatible list of drives includes only a couple that are currently available.  A couple of years ago, almost about any drive would work in the DNS-343 but not so now. Are you feeling lucky?
2/ Perhaps the array is happier if all drives transfer speed is the same.  (However I know people have mixed drives without issues.)

Also when a desktop drive has a read error it will exhaustively try to recover the data.  It will then relocate the data and bad spot the damaged area.  This takes time.
When a drive made for Raid use has an error it will retry but not for long as the Raid controller can efficiently calculate the missing information and log the hiccup.
Desktop drives probably should not be used in Raid arrays as their excessive retry time and methods can result in the Raid controller dropping them out.

Your Seagate drive I assume is a desktop drive.  Until it has a read error it will work the same as a Raid drive.  When it does have a read error it tries to recover but the Raid controller has already calculated the missing information, using data from the other drives, and moved onto the next request.  The drive does not know to stop retrying and has busied itself, the controller drops it out, reports the issue and you buy a new drive.
Could the drive be reused, perhaps and perhaps not.  The DNS-343 assumes it has good drives when you hit the regen button, you must guarantee this, if they are not then it won't be successful and data may be lost.

They problem is you don't know the state of the rejected drive, just because it formats OK does not means it should be reused.  Better to put it in a USB caddy and use it as a backup drive and get a new identical drive for the Raid array.
Title: Re: "RAID 5 (Degraded)" but drive now shows status as "Normal"
Post by: JavaLawyer on July 01, 2012, 08:50:32 AM
There should be a sticky with "Best practices for using the DNS-343" and those of you who are more knowledgable on the topic can share with the rest of us what we might do to reduce the chances of complications. Thanks for your notes.

I have been assembling an FAQ board (http://forums.dlink.com/index.php?board=107.0) for that very purpose. One critical piece of information that board lacks (as this thread aptly highlights) is best practices particularly related to RAID-5 handing, maintenance, and troubleshooting.

If anyone would care to contribute content for a detailed post to that end, I will gladly pull all of the seed material together into a comprehensive sticky post. JAylmer seems quite adept with RAID-5. . . . No hinting implied  ;)
Title: Re: "RAID 5 (Degraded)" but drive now shows status as "Normal"
Post by: melvynadam on July 02, 2012, 01:33:00 AM
I will gladly pull all of the seed material together into a comprehensive sticky post. JAylmer seems quite adept with RAID-5. . . . No hinting implied  ;)

 ;D

That's a great help to those of us with less knowledge and very much appreciated. Thank you.

This scary episode has now (hopefully) ended for me. After following JAylmer (http://forums.dlink.com/index.php?action=profile;u=55984)'s advice, I disabled auto-rebuild and the NAS then took 20hrs to "sync". Following which everything is back to normal. I had put considerable effort and money into getting a new HDD that was exactly the same model and having a friend bring it over to my country (it's not available here). I don't consider that effort wasted because I now have the peace of mind that I have a "spare" drive should the need arise one day.
Title: Re: "RAID 5 (Degraded)" but drive now shows status as "Normal"
Post by: JavaLawyer on July 02, 2012, 04:53:50 AM
;D

That's a great help to those of us with less knowledge and very much appreciated. Thank you.

I'm only too glad to help, but will need someone with enough RAID-5 experience (on the DNS-343) to cobble together draft material for me to tailor.
Title: Re: "RAID 5 (Degraded)" but drive now shows status as "Normal"
Post by: JAylmer on July 02, 2012, 10:35:24 AM
Happy to contribute to the FAQ, but gaps in my knowledge also. Please email.
Title: Re: "RAID 5 (Degraded)" but drive now shows status as "Normal"
Post by: JavaLawyer on July 02, 2012, 10:40:35 AM
Happy to contribute to the FAQ, but gaps in my knowledge also. Please email.

Thank you! Sent you a PM  :D
Title: Re: "RAID 5 (Degraded)" but drive now shows status as "Normal"
Post by: melvynadam on July 08, 2012, 12:38:11 AM
Unfortunately I can't tag this thread as "Closed" or "Solved" because my saga isn't over yet.

For the last three days I've received the same very polite but still saddening email from my DNS-343: "A SMART test was performed ... The result of the test is: Fail".

It's the same HDD that was reported as failing at the beginning of this thread. Of course, since I opened this thread I have disabled auto-rebuild which restored full access to my data but I'm worried about this faulty drive.

Regular readers will know I've already purchased a replacement drive so should I just switch out the drive and rebuild or is there some other procedure I should perform first?
Title: Re: "RAID 5 (Degraded)" but drive now shows status as "Normal"
Post by: JAylmer on July 08, 2012, 11:20:58 AM
S.M.A.R.T. is a collection of hardware counters that live in a disk drive.
Example: Let's say a drive is asked to seek to cylinder 23, when it arrives it does a read to verify where it stopped and it reports cylinder 24.  The drive knows it made a mistake and increments the "seek error" hardware counter. It then retrys the seek, success.  When this "seek error counter" reaches a threshold (eg.200 seek errors) then SMART will send you an email each time you have scheduled the counter values to be scanned and reported on.  In the next 12 months let's say the drive has no further errors but the SMART test will keep sending those emails (drive failed) because it does not do any testing each time it is just looks at the counter values and reports an error because one of them is exceeding a threshold.

A classic: Install an incompatible drive into a DNS-343 array, the array will fail. Using the DNS-343 admin panels, the SMART counters are now extremely high.
Is the drive faulty = No.  Can the counters ever be reset = No. 
Has the drive been damaged = No. Will it now always report drive has failed = Yes.
Will the drive supplier accept this drive back = Not if they know.
The smarter S.M.A.R.T. apps snap the values each test and look for what counters have incremented lately.
In your case you had some errors - are you still having them now?  To find out go to the Admin panels and look at the Log which is nothing to do with SMART testing.  I am not suggesting that the SMART counters are useless, just nearly useless, you could snap these values periodically and manually check is any counters are getting significantly worse.

One thing to remember that hard drives typically fail because of issues between the head and the platter surface, and then from the resulting and growing contamination.  Recent stats indicate that if a drive gets a read error where it has to relocate the data then that drive is 47 times more likely to fail within the next few weeks.  Sometimes you can get the data off the drive in time but sometimes not.

My understanding when you are unsure of the true status of your drives:
1/ Has the raid array degraded? - critical
2/ Are errors being logged into the log file? - critical
If all is OK to here then you can almost relax.

If you have a lot of free time:
3/ Are the SMART counter values worsening
(I don't recall whether the counters +1 or -1 to record a new error)

The fact that you are receiving the email advising you of errors may or may not be significant:
"A SMART test was performed ... The result of the test is: Fail".
I have had the same annoying thing happen also and I know how it makes you feel.
Also you have redundancy with raid 5 and you make backups.

Another curious thing to watch for is that when an error occurs in a raid 5 array, sometimes all of the drives have their smart counter value incrementated rather than only the drive with the error.

Hope this helps, I know its not definitive.
Title: Re: "RAID 5 (Degraded)" but drive now shows status as "Normal"
Post by: JAylmer on July 09, 2012, 10:26:28 PM
The other option you have is to remove the drives and test scan each of them in a USB caddy using manufacturers diagnostics, starting with the one that is reporting SMART quick scan failures.  Let's assume it is actually faulty and fails in the different environment also.  You replace it, rebuild the array, probably without the need to initialise the array from scratch, so no data loss.  I think I would at least scan the suspect drive then you would know.
 
I would backup first if required.
Remove the suspect drive, check that I can still access all of my files from the now degraded array.
Make sure that mp3's load, Word docs open etc.
Insert your spare drive and rebuild the array.
In a USB caddy, exhaustively test the suspect drive with full scans, zero writes etc.