DNA7091
Nickel

One of drives comes as failed

Have old(er) and perfectly functional R410.  Was upgrading drives and replaced 2 Samsung Enterprise SSDs with 4 Intel SSDs.  All was good until I decided to test RAID and pulled one drive (UNIT 1) from the server.  Again, all was well and I plugged the drive back in.  Server cam back with Drive 1 failure in RAID and started rebuild.


Rebuild is completed, but the system still states Drive 1 as failed.  I have H700 on this unit and it reports drive as online and functioning.

Any input?

0 Kudos
7 Replies

RE: One of drives comes as failed

Hello.

What version of OpenManage are you using? What is the status of the Virtual Disk in OMSA or controller BIOS? Try Re-syncing the drive and see what happens.

Robert Alakara

Dell EMC | Enterprise Services

0 Kudos
DNA7091
Nickel

RE: One of drives comes as failed

Robert,

Thanks for chiming in.  We don't use OpenManage. We have had two SSD drives in RAID 1, centOS 7 boot.  Replaced them and added two more.  Started from scratch and set up RAID 10. 

I am puzzled to say at least.  it knocked off two technicians for about 12 hours each.  They look at each other and ready to dump the unit because of cost it already inflicted.


We decided to run some updates. Our BIOS is current. Under unified server configuration testing network fails, but correctly pulls ftp.dell.com as 143.166.147.76. Then it lists all current versions and available updated. After that it fails claiming "The updates you are trying to apply are not Dell-authorized updates."

In addition, front panel now tells us DISK0 and DISK1 failed.

Lastly, VD is rebuilding according to H700 CTRL+R config.  It is at 43% and has been like this for about 2 hours.


BTW, if we prevent boot to go over CTRL+R front panel remains blue ... and all disks are blinking (syncing).  However last time we let it run for about 3 hours (on SSD!!!!) and it failed.

0 Kudos
DNA7091
Nickel

RE: One of drives comes as failed

Another strange issue.  We dropped RAID 10 and set RAID 5 with 4 drives.  Raid came back OK.  Installed CentOS 7.0 minimal. RAID failed with both Drive0 and Drive1.  Installation was completed!!!

Restarted, front panel is blue, everything OK. Began update of CentOS 7.  Within 2 minutes or so, screen turns gold and Drive0 and Drive1 come up as failed.  Update goes on, RAID 5 can't function with 2 drives failed, right? Update completes without issues.  Restart > CNTRL + R > screen goes blue, controller shows VD as operational, no other actions take place.

???

0 Kudos
DNA7091
Nickel

RE: One of drives comes as failed

Ok, I need someone to tell me this is possible.  Raid 5 or raid 10 both work on 4 drive setup.  All drives are identical Intel 750. Raids are consistent after being scanned by H700 card.  Front panel still shows Drive0 and Drive1 failed.  Raid 10 should not work in this case as two drives in a single span are marked as failed.  Is is possible that BIOS or whatever else is not working correctly with Intel drives and renders them as faulty?

0 Kudos
DNA7091
Nickel

RE: One of drives comes as failed

Robert,

We managed to update the server. All is current. After looking around it seems like many people complain that the non-dell drives are working, but will produce "warning" blinking.  The strange part here is that it only affects Drive 0 and Drive 1.  The other two are working just fine.

0 Kudos

RE: One of drives comes as failed

Ok, I need someone to tell me this is possible.  Raid 5 or raid 10 both work on 4 drive setup

Yes, both work on 4 drive setup. RAID 5 requires a minimum of 3 drives whereas RAID 10 requires 4.

Front panel still shows Drive0 and Drive1 failed.  Raid 10 should not work in this case as two drives in a single span are marked as failed.  Is is possible that BIOS or whatever else is not working correctly with Intel drives and renders them as faulty?

In RAID 10 configuration, failure of 2 drives on the same span crashes the RAID array. If the Virtual Disk is optimal when Drives 0& 1 show failed, then it is a false alert. Drives 0 &1 may not be Dell supported drives because they have different firmware that can not properly communicate with firmware on the Dell's controller. This causes all sorts of issues like false alerts, drives frequently going offline, failure to rebuild etc

We recommend that you Dell certified drives to avoid such issues.

Robert Alakara

Dell EMC | Enterprise Services

0 Kudos
DNA7091
Nickel

RE: One of drives comes as failed

Robert,

Thank you again for stepping in.  We spent too much time to just drop it Smiley Sad  There is a lot of inconsistencies with this setup.  If we build any RAID using Drive2 and Drive3 (two slots on the right), the system has no problems with that.  We did not try to use cable B to power Drive0 and Drive1, which is our next move.

While I see the point of guaranteed performance only on certified drives, producing error on the front panel, while letting H700 run the RAID as consistent and functional is not a good practice.

Also, Intel is not A-DATA, it's a brand name used by multiple manufacturers.  While drive may not be certified, it is nevertheless offers 100+ Tb of writing capacity.  Our server in the last five years only got about 8+ Tb of data written.  So, we are looking at healthy 50+ years of usage Smiley Happy

0 Kudos