• June 15, 2024, 04:06:14 PM
  • Welcome, Guest
Please login or register.

Login with username, password and session length
Advanced search  

News:

This Forum Beta is ONLY for registered owners of D-Link products in the USA for which we have created boards at this time.

Pages: [1] 2 3 4

Author Topic: status of '321 data corruption caused by Linux kernel bug?  (Read 35168 times)

peas

  • Level 2 Member
  • **
  • Posts: 47
status of '321 data corruption caused by Linux kernel bug?
« on: March 16, 2009, 08:18:18 PM »

An individual found a data corruption bug by doing copy-compare to the DNS-321 :
http://www.amazon.com/review/R3I00GAEVWL1QZ/ref=cm_cr_pr_viewpnt#R3I00GAEVWL1QZ

D-Link what is the status of this bug?  I'd hate to think that my data is being silently corrupted.

Here is the full text of the user review:

Quote
By    S. Kosto
Well, it was working fine for the features I was using. Immediately updated to their latest firmware release. Put 2 1TB drives in it, all the backup options (rebuild drive, etc.) seemed fine as I played around with swapping drives out. Then I tried to copy all of my current data over to this NAS box. After about a full day of copying (I have several hundred gigs of files) I went to check the status of the backup.

The backup had completed... HOWEVER, since I had turned on data validation (rereads the destination and source files and compares after the backup) it noted that out of the 1000s of files I had backed up that 12 of them were "not equivalent to the source files".

I took down the names of the files and then did a hex dump compare of the old and new files. To my surprise the files that were copied onto the NAS box had *exactly* 76 bytes of zero in very specific relative offsets in each file. It was always at hex offsets with the last 3 nibbles of the file offset being in the range of xfb4-xfff that were all zero, in all of the "corrupted" files.

Puzzled, I did some Google searching and found that there was a Linux kernel bug found at the end of 2006 that just happens to exactly match this behavior! The kernel was losing the "dirty bits" (modified memory page indicators) when it was writing to ext2 or ext3 file systems (this box uses ext2). This only happened on certain "chunks" (76 bytes for the Linux case) if they were the 76 bytes that fall at the end of a 4k memory page boundary (the last 76 bytes of a 4k page are... you guessed it!! bytes xfb4-xfff).

The data I was transferring was from a Windows XP machine and this NAS box is internally running.. yep, LINUX! I believe they likely have a version of kernel running on this thing that was silently corrupting my data, as all the issues seem to exactly match my conditions.

That is the WORST kind of data corruption ("silent") because there were NO error indications at ALL except for when it had done the final recompare, which good thing I had turned that on or I would have NEVER known my data was being corrupted as it was copied to this NAS box!

I notified the D-Link tech support people about this issue, and they responded back saying that they are looking into what is causing the problem (think I gave them a good enough head's up on this one!)

I promptly returned the box to get my money back and am now running w/ a RAID 1 configuration in my main PC instead of having an external NAS box.

Support notes - I stayed on the phone for the D-Link tech support number for a good 20+ minutes, all I got was the answering service kept repeating "due to a large volume of calls, ... " so I just hung up and emailed them instead. Took them about a week to get back to me (but they did).

Other gripes about the box - the little levers to remove the drives were REALLY hard to use, my thumb got sore after swapping the drives a couple times for doing the failed drive testing.

This review is specifically about the DNS-321 as that was the only one I tested, however the DNS-323 is VERY similar to this box (just basically added a print server), so I can't say if that one is any better or does the same corruption as this one does (it's quite possible).
Logged

ECF

  • Administrator
  • Level 11 Member
  • *
  • Posts: 2692
Re: status of '321 data corruption caused by Linux kernel bug?
« Reply #1 on: March 17, 2009, 03:05:15 PM »

I am very sorry but I have not herd of any verification of this issue what so ever however it will be investigated to look for possible issues. Thank you for the post.
« Last Edit: March 17, 2009, 03:09:28 PM by ECF »
Logged
Never forget that only dead fish swim with the stream

djy8131

  • Level 2 Member
  • **
  • Posts: 27
Re: status of '321 data corruption caused by Linux kernel bug?
« Reply #2 on: March 20, 2009, 08:07:30 AM »

Anyone else had this problem?  It sounds like it is a likely bug if Dlink uses the buggy version of the kernel.  I think I will have to replace mine with a different model if this is true since we have not seen a firmware update for quite some time.  Can Dlink confirm the Linux kernel version that is being used?
« Last Edit: March 20, 2009, 08:09:01 AM by djy8131 »
Logged

fordem

  • Level 10 Member
  • *****
  • Posts: 2168
Re: status of '321 data corruption caused by Linux kernel bug?
« Reply #3 on: March 20, 2009, 10:23:51 AM »

Forget about which version of the kernel is in use for a few minutes - Google linux kernel bug xbf4-xfff and see what shows up.

I got exactly nine hits - all pointing back to the same source - have you tried to find any further details on this alleged bug?

Further searches on linux kernel bug and different variations of 76 bytes, zero fill, 4k page turned up nothing of any significance.

Have you tried testing your data?

Personally I don't put a lot of faith in many of those reviews, especially the ones on Amazon which often come from inexperienced end users - this reviewer does not mention (does he even know) which version of the kernel has the bug or which version of the kernel is in use on the DNS-321, and suggestions such as ...

Quote
This review is specifically about the DNS-321 as that was the only one I tested, however the DNS-323 is VERY similar to this box (just basically added a print server), so I can't say if that one is any better or does the same corruption as this one does (it's quite possible).

don't exactly inspire my confidence.

For what it's worth - the DNS-323 is a precursor to the DNS-321, so it's really that the print server was removed, rather than added, and although the units are similar, they are also quite different - among other things, they use different processors.  I use the DNS-323 and I verify my backups and have never had a verification error that could not be tracked to the data on the client being changed between the backup and the verify (this happens when I forget to close my email client before backing up).

I've also, on rare ocassion, had the need to restore from those backups, which can be considered the unltimate verification, and again, never had an issue.

Oh - the DNS-323 with firmware 1.06, runs kernel version 2.6.12.6, and I believe, so does the DNS-321.
Logged
RAID1 is for disk redundancy - NOT data backup - don't confuse the two.

garyhgaryh

  • Level 3 Member
  • ***
  • Posts: 133
Re: status of '321 data corruption caused by Linux kernel bug?
« Reply #4 on: March 22, 2009, 10:10:55 PM »

You have to admit this post by the OP does not inspire confidence in the box.  I bought a dns-321 and dns-323 and I would hate to think my files are being corrupted.

On a different subject, I had a very weird thing happen to my dns-323.  I can no longer log into the [download] section of the UI.  I get the following:

Backup your files before proceeding!

To stabilize operation, please login and select TOOLS-->RAID to reformat your device with an EXT2 file system.

What the hell? I haven't done anything to this box...

Gary

Forget about which version of the kernel is in use for a few minutes - Google linux kernel bug xbf4-xfff and see what shows up.

I got exactly nine hits - all pointing back to the same source - have you tried to find any further details on this alleged bug?

Further searches on linux kernel bug and different variations of 76 bytes, zero fill, 4k page turned up nothing of any significance.

Have you tried testing your data?

Personally I don't put a lot of faith in many of those reviews, especially the ones on Amazon which often come from inexperienced end users - this reviewer does not mention (does he even know) which version of the kernel has the bug or which version of the kernel is in use on the DNS-321, and suggestions such as ...

don't exactly inspire my confidence.

For what it's worth - the DNS-323 is a precursor to the DNS-321, so it's really that the print server was removed, rather than added, and although the units are similar, they are also quite different - among other things, they use different processors.  I use the DNS-323 and I verify my backups and have never had a verification error that could not be tracked to the data on the client being changed between the backup and the verify (this happens when I forget to close my email client before backing up).

I've also, on rare ocassion, had the need to restore from those backups, which can be considered the unltimate verification, and again, never had an issue.

Oh - the DNS-323 with firmware 1.06, runs kernel version 2.6.12.6, and I believe, so does the DNS-321.
« Last Edit: March 22, 2009, 11:16:12 PM by garyhgaryh »
Logged

peas

  • Level 2 Member
  • **
  • Posts: 47
Re: status of '321 data corruption caused by Linux kernel bug?
« Reply #5 on: March 23, 2009, 01:07:42 AM »

You're using the wrong search terms.  Try "Linux kernel bug dirty bits 76 bytes", the 1st hit specifically mentions this bug in at least versions 2.6.5 thru 2.6.19 :
http://kerneltrap.org/node/7517

I don't verify the data that I store to the DNS-321, so I can't say one way or the other.  I store mostly media files, so I'm not likely to notice a small glitch here or there.  One thing that I've learned over the years developing and testing computer HW/SW is that if one person encounters a bug, it's likely a real problem lurking in the corners.  Just because you haven't encountered it doesn't disqualify its existence.  Let's say the data is safe 99.9% of the time.  That 0.1% corruption could occur in something critical and I'd rather be proactive and have D-Link investigate/fix this bug than dismiss it.

Since you present yourself as an expert here, please tell us which kernel version the DNS-321 runs.  Regardless of what you assume of the Amazon reviewer, he did a valid copy-verify test and discovered an issue.  Let's refrain from character bashing and stick to the evidence in front of us.

Forget about which version of the kernel is in use for a few minutes - Google linux kernel bug xbf4-xfff and see what shows up.

I got exactly nine hits - all pointing back to the same source - have you tried to find any further details on this alleged bug?

Further searches on linux kernel bug and different variations of 76 bytes, zero fill, 4k page turned up nothing of any significance.

Have you tried testing your data?

Personally I don't put a lot of faith in many of those reviews, especially the ones on Amazon which often come from inexperienced end users - this reviewer does not mention (does he even know) which version of the kernel has the bug or which version of the kernel is in use on the DNS-321, and suggestions such as ...
« Last Edit: March 23, 2009, 01:12:55 AM by peas »
Logged

fordem

  • Level 10 Member
  • *****
  • Posts: 2168
Re: status of '321 data corruption caused by Linux kernel bug?
« Reply #6 on: March 23, 2009, 06:00:08 AM »

I don't know if you noticed - I did state that I was using a 323 and provided the kernel version for that - 2.6.12.6 - although I have good reason to believe that the 321 uses the same kernel, I would not make that as a statement, since I have not (and can not) personally checked it.

By the way - I don't claim to be an expert - just a sceptic, I do not believe everything I read on line, and I do not support the "whipping up of hysteria" that so often happens when bugs and other "holes" are discovered - if you can and have duplicated the problem, by all means spread the word, if you can't, then further investigation is warranted.
Logged
RAID1 is for disk redundancy - NOT data backup - don't confuse the two.

peas

  • Level 2 Member
  • **
  • Posts: 47
Re: status of '321 data corruption caused by Linux kernel bug?
« Reply #7 on: March 23, 2009, 08:14:09 AM »

I paid money for this product and have entrusted my data to it.  It's not my job to wring out the bugs.  To the contrary, I hope D-Link seriously investigates reports of data loss on their data storage products.  By the time I encounter data corruption, it will be too late for my data.  And I'm not about to buy another '321 just for testing purposes.

I don't understand why you're so virulently opposed to investigating this problem.  Maybe you work in the sustaining dept and don't want to see new bugs, or own D-Link stock and don't want bad press?  In the latter case it's actually better for D-Link if they work on this because it shows consumers that they're responsible and support the product well.
Logged

mig

  • Level 3 Member
  • ***
  • Posts: 217
Re: status of '321 data corruption caused by Linux kernel bug?
« Reply #8 on: March 23, 2009, 08:45:00 AM »

According to the D-Link GPL site ftp://ftp.dlink.com/GPL/DNS-321/
the DNS-321 runs a 2.6.12.6 kernel; however, the D-Link FTP site does
not indicate which version of the DNS-321 firmware this posted GPL
represents.
Logged

kimgkimg

  • Level 1 Member
  • *
  • Posts: 10
Re: status of '321 data corruption caused by Linux kernel bug?
« Reply #9 on: March 23, 2009, 10:49:13 AM »


How often are update released for the product?  I just took delivery of a DNS-321 last week, but am on the fence about keeping it or returning it.

Logged

D-Link Multimedia

  • Poweruser
  • Level 7 Member
  • **
  • Posts: 1066
    • D-link Systems, Inc.
Re: status of '321 data corruption caused by Linux kernel bug?
« Reply #10 on: March 23, 2009, 11:09:04 AM »

We take issues like this VERY seriously. It is being fully investigated and if the DNS-321 or any other NAS we develope is at risk, it will be resolved.
Logged

mig

  • Level 3 Member
  • ***
  • Posts: 217
Re: status of '321 data corruption caused by Linux kernel bug?
« Reply #11 on: March 23, 2009, 11:22:54 AM »

How often are update released for the product?  I just took delivery of a DNS-321 last week, but am on the fence about keeping it or returning it.
According to D-Link's web site for DNS-321:
   Firmware:
      v1.00 released 11/10/2008 (shipping version)
      v1.01 released 11/11/2008
   Easy Search
      v4.1.0.0 released 8/21/2008 (shipping version)
      v4.5.0.0 released 11/11/2008
 
However, this forum group was started (welcome message posted) 07/20/2008
and the first issue was posted 8/12/2008 (a few months prior to the FW v1.00 shipping version ???)
Logged

fordem

  • Level 10 Member
  • *****
  • Posts: 2168
Re: status of '321 data corruption caused by Linux kernel bug?
« Reply #12 on: March 23, 2009, 04:21:40 PM »

You're using the wrong search terms.  Try "Linux kernel bug dirty bits 76 bytes", the 1st hit specifically mentions this bug in at least versions 2.6.5 thru 2.6.19 :

http://kerneltrap.org/node/7517


Now that I've had a chance to sit down and study the link you provided - I fail to see it's relevance to the DNS-321

The link you posted relates to a kernel bug causing an IO race condition, and subsequent corruption when rtorrent hashes are checked, I fail to see how it relates to corruption when transferring files, presumably using CIFS/SMB (it's not stated by your reviewer) on a device that doesn't even have rtorrent installed.

Whilst I'm about it - did you notice any mention of 76 bytes anywhere in that page
Logged
RAID1 is for disk redundancy - NOT data backup - don't confuse the two.

Fatman

  • Level 9 Member
  • ****
  • Posts: 1675
Re: status of '321 data corruption caused by Linux kernel bug?
« Reply #13 on: March 25, 2009, 08:37:57 AM »

Now that I've had a chance to sit down and study the link you provided - I fail to see it's relevance to the DNS-321

The link you posted relates to a kernel bug causing an IO race condition, and subsequent corruption when rtorrent hashes are checked, I fail to see how it relates to corruption when transferring files, presumably using CIFS/SMB (it's not stated by your reviewer) on a device that doesn't even have rtorrent installed.

Whilst I'm about it - did you notice any mention of 76 bytes anywhere in that page

Sorry to burst your bubble, but there is relevance, they were using rtorrent to test this issue because it was discovered with rtorrent.

If I am reading this correctly (I am no kernel maintainer)...

The bug is in the way the IO layer interacts with the EXT2/3 drivers.  That definitely is something that could apply to the DNS series if the correct (unpatched) Kernels are in place.

That said given the difficulty that existed in showing corruption shown by Linus and friends, I have to question if you can trigger this bug through networked IO.  This bug required that no FS activity could occur between subsequent page dirties and cleans.  With network data coming in and being separately buffered by samba I would expect (not that it is not possible, again I am not qualified for this) that we would escape the race.

Also the 76 byte spiel was listed in the e-mail, I found it with a ctrl+f and searched for the number 76.  I think it is essentially a red herring to this bug however, it appears that the number came up in a particular test and is being spread around because of it.

D-Link is testing and will patch if a vulnerable kernel is in place.
Logged
non progredi est regredi

fordem

  • Level 10 Member
  • *****
  • Posts: 2168
Re: status of '321 data corruption caused by Linux kernel bug?
« Reply #14 on: March 25, 2009, 10:00:22 AM »

I'm well aware of what a kernel bug is and also that fact that a kernel bug, because of what it is, can affect
everything else that runs over the kernel - in short linux itself.

Yes, this was tested with rtorrent because it was discovered with rtorrent, but I have no seen no evidence to suggest that it occurs in the absence of rtorrent - and if it did - there would probably have been a lot more about it in the search engines.

Being a kernel bug, this has the potential to affect every linux distro and embedded device running the affected kernel versions - surely someone else would have experienced it by now.

As I believe I've already said, what I'm against is the "whipping up of hysteria" that occurs in these cases - so to speak - do the due diligence rather than simply regurgitating someone else's unsubstantiated hype.

Am I right?  Am I wrong?  We may never know - D-Link may simply upgrade to a new kernel version, which by the way - and you have noticed it in Linux Torvalds comments - may still have a "tiny" race condition.

Perhaps we should all avoid linux because it has race conditions than can potentially corrupt our data ;)
Logged
RAID1 is for disk redundancy - NOT data backup - don't confuse the two.
Pages: [1] 2 3 4