Start a Conversation

Unsolved

This post is more than 5 years old

9315

September 17th, 2011 21:00

M18x RAID 1 randomly drops the array. 4 sets of new hard drives and even 2 sets of new M18x systems, RAID 1 keeps dropping!

on a random basis one of the hard drives would drop out of the raid 1 array.  it would happen at all phases including windows boot up or even sitting here typing this message.  if the array drops during windows boot screen i have to force a hard shut down.  when this happens one of the drives is reported to be missing from within RST and even from the boot meny of RST.  the drive will remain missing until i cold boot the system or even use sleep mode and come out of it. 

i check windows event viewer and see an warning from IAStorDataMgrSvc that;

Disk on port 0: Removed.

 

i upgraded to the latest intel rapid storage technology, versoin 10.6.0.1002 as i initially thought the slightly older version on dell's driver download web page was the culprit.  i also went through 4 different sets of hard drives to rule out a bad hard drive as that was my initial thinking.

this drop out just occured  a few minutes ago while the system was fairly idle and browsing the internet.  i heard a click from the hard drive that indicates it parked it heads abruptly.  the system froze momentarily and RST popped up warning of raid 1 degregation. 

if i simply do a warm reboot the hard drive will still be missing but if i do a cold reboot where power gets cycled it will pop up and the array will start to rebuild again.  it is as if power was inadvertantly cut off from port 0 or something.

i went through 4 sets of new hard drives to rule out it was faulty drives; pair of western digital scorpio blacks 750gb 7200rpm, 2 pairs of samsung 1tb 5400 rpm drives, pair of hitachi 7k750 750gb 7200rpm drives.  i also had dell replace the entire system with a new one as i thought i may have had a bad motherboard or something and since i was within the initial 21 day return policy i felt better getting it replaced.

all the OS installations were fresh on these drives and the drivers were downloaded from dell's driver page to ensure they were up to date.  i do not know what to do anymore regarding this issue.  raid rebuilds take hours to complete and system instability is not something i can deal with.

 

1 Message

June 23rd, 2012 21:00

I have the same issue with a Z68 motherboard, Intel 10.6 RST handling a RAID 1 configuration.  Good drives WILL fall out randomly from this controller.  I've had 5 events between 1/23 and 6/23, all after reboots.  In each case, there was massive data corruption at power  on.  I don't understand WHY the controller quietly drops a drive, but I do now understand why the data corruption occurs.

1. RAID controller incorrectly decides that one of the two drives is dead.

2. RAID controller logs the event described below and decommissions that drive.

The Event log will log an item like "Volume Volume0: No longer present on system." very quietly under Windows Logs -> Applications -> Information, IAStorDataMgrSvc, EventId = 0.

3. The filesystem volume continues to be available for the next several (up to 7 in my case) days and chkdsk does not return any issues.  Conjecture: only ONE of two drives is being used during this time, but the admin has no clue that this condition exists.

4. An event causes the admin to restart the machine (Windows update in my case).  After the restart the RAID controller sees both drives again and marks them degraded.  Windows boots and starts out with a chkdsk attempt.  At this point you can expect MASSIVE data corruption because chkdsk may be reading parts of its data from one drive and parts from another, and the two are very different.  Further corruption occurs.  Any hope of recovering data from the drive that was "good" prior to the shutdown is gone.

My system has a UPS.  The only thing to ensure that your data is not lost in the manner above is to scan for the event prior to a scheduled system shutdown.  If so, dump all data to a backup drive before powering off from the "good" drive (whatever the RAID controller is using).  At power on, the system will wake up and corrupt both drives.  Guaranteed.  So, this is not a solution ... just a workaround that if you are sitting on this ticking bomb, you can salvage your data and move to a more reliable solution _before_ you reboot your server.  What Intel is missing is that once it marks a drive bad, it should store that decision in flash somewhere and not try to use the drive again if it comes ready at the next power on!

I will be disabling the RAID controller based on everything that I have read.  I think Intel RAID controllers need Intel drives.  Oh wait, they don't make HDD drives.

My old Asus MB had a VIA RAID controller, that didn't ever have these problems (with the same drives).

8 Wizard

 • 

17K Posts

June 23rd, 2012 23:00

I can't help with your exact problem. I don't recommend RAID unless you have Enterprise class drives in a real server.

However, have you considered SSD? That will give you the speed you want without the hassle (and maybe put a larger spinning drive for storage in slot 2). Or, go with Seagate Hybrid HDD.

280 Posts

June 24th, 2012 08:00

I have a related problem with RST 9.6.0.1014 under Raid 0 (not 1) which may be of interest to you.  For my detail problems see:

en.community.dell.com/.../20072888.aspx

Since my SSD's never had a problem with Raid 0 but my WD array did (never any data corruption), based on Tesla's suggestion I disabled Raid 0 for my WD array and monitored that drive which was periodically dropped from the array (always the 2nd one). Never any problem with that drive running single monitoring it closely.

So after a while I put them back under Raid 0 to try again - Darn it, should have reversed the drive location at that time, didn't think!  Anyway, again, usually only if PC started but not entering the password for a few minutes. this infrequent problem re-occurred. Simply opening RST and setting drive to normal always corrected it, in fact I believe (will test specifically the next time RST pops up) when trying to access the drive via Total Commander while (I believe) RST popped up, it took a second or two for TC to react but then the drive showed up correctly and checking afterwards RST all was o.k.  As I am not sure it was RST which popped up, I need to test this specifically. So may be just like telling RST to set the drive to normal, trying to access the drive will also correct it.  

So is there indeed a problem with RST or is one of my WD drives flaky after all despite not showing any problem while running single for over a month.  But with your experience using various drives, it appears to be RST.  To repeat in my case  all the data on that array is not corrupted based on my repeated testing via byte for byte compare of numerous files on my data backup partition (on WD) to my data partition (on SSD). Particularly those files which were just automatically backed up after starting my system which triggers the scheduled xcopy task when I saw the RST notice. 

I guess my next test should be to reverse the seatings of those two WD drives to see if RST now drops the first or again the 2nd drive.  If the first drive, then that drive would be after all flaky.  If the 2nd drive it would be RST or a weak physical connection issue at the PC. Right? 

QUESTIONS:

  1. Can I just (after backing up data to my big 5th drive from both partitions) just turn off PC, switch drives, and start PC?  Perhaps need to at least delete partitons or need I to do even more?
  2. Can I put that drive into my last (open) slot; # 6, leaving slot 4 unused to rule out a connection problem notwithstanding I had no problem when running that drive single?
  3. Where did you get 10.6 from and would that version work on my Area 51 ALX? But then maybe 10.6 is even more problematic and may result in data corruption. Tesla sure has a point as to Raid and perhaps Intel doesn't do to good a job with the non-enterprise software.

But first need to solve my current problem to connect my new iPad3 to my router under WiFi I have been trying for over 3 weeks without success.  LOL.

2 Posts

June 24th, 2012 12:00

omg back from the dead!

my current situation?  i just purchased the newer 750gb momentus XT hybrid drives and running them in raid 1.  they have not acted up yet and it has been several weeks.  if they act up i will try to remember to let you guys know.

the solution from alienware was to swap out the motherboard but i did not want the laptop taken apart as it is not servicable-friendly.

No Events found!

Top