This post is more than 5 years old
9 Posts
1
1534207
Non certified drives throwing Faults
Hello,
We have a number of Dell blades running Perc h700 and Perc h710 cards.
Most of these are running Samsung 840 pro SSD drives, and Samsung 830 SSD drives - these drives are running perfectly fine, and simply show us as "non certified" and the array is "Non Critical", this causes no problems.
Now we've just installed some servers with the latest crucial 960GB SSD drives, and again it all works fine, but every time the machines boot it's reporting errors into the CMC log :
- Server 12 health changed to a critical state from either a normal or warning state.
- Fault detected on drive 0 in disk drive bay 1
- Fault detected on drive 1 in disk drive bay 1.
Does anyone have any idea why this might be happening?
Here are the omreports:
---
root@zeus :emotion-53:# omreport storage vdisk List of Virtual Disks in the System Controller PERC H710 Mini (Embedded) ID : 0 Status : Ok Name : Zeus State : Ready Hot Spare Policy violated : Not Assigned Encrypted : Not Applicable Layout : RAID-1 Size : 850.00 GB (912680550400 bytes) Device Name : /dev/sda Bus Protocol : SATA Media : SSD Read Policy : Adaptive Read Ahead Write Policy : Write Back Cache Policy : Not Applicable Stripe Element Size : 64 KB Disk Cache Policy : Enabled root@zeus :emotion-53:# omreport storage pdisk controller=0 List of Physical Disks on Controller PERC H710 Mini (Embedded) Controller PERC H710 Mini (Embedded) ID : 0:0 Status : Non-Critical Name : Physical Disk 0:0 State : Online Power Status : Not Applicable Bus Protocol : SATA Media : SSD Device Life Remaining : Not Applicable Failure Predicted : No Revision : MU02 Driver Version : Not Applicable Model Number : Not Applicable Certified : No Encryption Capable : Yes Encrypted : No Progress : Not Applicable Mirror Set ID : Not Applicable Capacity : 893.75 GB (959656755200 bytes) Used RAID Disk Space : 850.00 GB (912680550400 bytes) Available RAID Disk Space : 43.75 GB (46976204800 bytes) Hot Spare : No Vendor ID : ATA Product ID : Crucial_CT960M500SSD1 Serial No. : 1311092FA641 Part Number : Not Available Negotiated Speed : 6.00 Gbps Capable Speed : 6.00 Gbps Device Write Cache : Not Applicable Manufacture Day : Not Available Manufacture Week : Not Available Manufacture Year : Not Available SAS Address : 4433221104000000 ID : 0:1 Status : Non-Critical Name : Physical Disk 0:1 State : Online Power Status : Not Applicable Bus Protocol : SATA Media : SSD Device Life Remaining : Not Applicable Failure Predicted : No Revision : MU02 Driver Version : Not Applicable Model Number : Not Applicable Certified : No Encryption Capable : Yes Encrypted : No Progress : Not Applicable Mirror Set ID : Not Applicable Capacity : 893.75 GB (959656755200 bytes) Used RAID Disk Space : 850.00 GB (912680550400 bytes) Available RAID Disk Space : 43.75 GB (46976204800 bytes) Hot Spare : No Vendor ID : ATA Product ID : Crucial_CT960M500SSD1 Serial No. : 1311092FA619 Part Number : Not Available Negotiated Speed : 6.00 Gbps Capable Speed : 6.00 Gbps Device Write Cache : Not Applicable Manufacture Day : Not Available Manufacture Week : Not Available Manufacture Year : Not Available SAS Address : 4433221105000000 root@zeus :emotion-53:# omreport storage controller Controller PERC H710 Mini (Embedded) Controllers ID : 0 Status : Ok Name : PERC H710 Mini Slot ID : Embedded State : Ready Firmware Version : 21.2.0-0007 Latest Available Firmware Version : Not Applicable Driver Version : 06.504.01.00-rh1 Minimum Required Driver Version : Not Applicable Storport Driver Version : Not Applicable Minimum Required Storport Driver Version : Not Applicable Number of Connectors : 1 Rebuild Rate : 30% BGI Rate : 30% Check Consistency Rate : 30% Reconstruct Rate : 30% Alarm State : Not Applicable Cluster Mode : Not Applicable SCSI Initiator ID : Not Applicable Cache Memory Size : 512 MB Patrol Read Mode : Auto Patrol Read State : Stopped Patrol Read Rate : 30% Patrol Read Iterations : 9 Abort Check Consistency on Error : Disabled Allow Revertible Hot Spare and Replace Member : Enabled Load Balance : Not Applicable Auto Replace Member on Predictive Failure : Disabled Redundant Path view : Not Applicable CacheCade Capable : Yes Persistent Hot Spare : Disabled Encryption Capable : Yes Encryption Key Present : No Encryption Mode : None Preserved Cache : No Spin Down Unconfigured Drives : Disabled Spin Down Hot Spares : Disabled Spin Down Configured Drives : Disabled Automatic Disk Power Saving (Idle C) : Disabled Time Interval for Spin Down (in Minutes) : Not Applicable Start Time (HH:MM) : Not Applicable
---
DELL-Geoff P
990 Posts
0
April 25th, 2013 08:00
The error comes from being a non Dell certified drive. The drive will function normally, but the firmware mismatch will return the error.
Regards,
kathirvel1980
1 Message
0
October 18th, 2016 08:00
trm201
9 Posts
0
April 25th, 2013 10:00
I understand that the drives will show as Non-Critical as they are no supported, however that's not my problem here. I'm saying that on Boot the CMC is reporting the drives are Faulty with these new Crucial Drives. This does not happen with the Samsung drives, which are also not supported.
DELL-Geoff P
990 Posts
0
April 25th, 2013 13:00
The firmware mismatch is most likely the cause of the fault. From Crucial's site, it shows these drives are for PC's and MAC's, and doesn't show any enterprise support for the drive. I would contact Crucial and find out if they have tested these drives on Dell's server platforms.
Regards,
tommo666
1 Rookie
1 Rookie
•
1.2K Posts
1
April 25th, 2013 15:00
This is a raid controller issue, Dell programs firmware with the disk headers from all supported drives. The controller shows the correct info for each connected disk. If the disk is un-supported then it doesn't mean it won't work but that it hasn't been tested or qualified for that controller and the header not entered in the firmware.
Your SSD is working fine, it's the controller reporting it.Doesn't know what's on the end of the cable so chucks up an error. I have seen a whole rack of 2950's with ssd's all showing red led faults but really all working ok.
As to the difference between samsung and crucial, well they are different manufacturers and probably different controller/firmware. So behave differently.
Trickery
5 Posts
0
July 12th, 2013 16:00
Is there a way to disable alerts for those particular SSDs? I'm having the same exact issue.
theflash1932
9 Legend
9 Legend
•
16.3K Posts
0
July 12th, 2013 18:00
No, what the controller logs is not changeable.
accenttek
3 Posts
0
August 13th, 2013 12:00
Did you ever find a work around or fix for this?
We are having a similar problem using OCZ Talos 2 R (Enterprise Grade) drives. The error notices on the drives would bother us, but our concern is that the iDRAC won't notify us if a drive actually does fail since it reports them as failed all the time even though they are fully functional.
czeetah
13 Posts
1
August 16th, 2013 12:00
Samr problem here with the 840's
Sent them back and bought Intel 530's
czeetah
13 Posts
0
August 16th, 2013 12:00
Contacted Samsung and their reply was "call Dell."
Trickery
5 Posts
0
October 9th, 2013 14:00
What about these:http://www.dell.com/Accessories/us/en/RC1338595/Product/Details/342-6142
At this point I just want to stop the controller from spitting out alerts on perfectly good drives for this non-critical read cache.
Trickery
5 Posts
0
October 9th, 2013 14:00
I read in a blog not to long ago that the Intel 530's did not cause the dell controller to throw errors. Can someone confirm this?
Would any Dell SSD work to prevent these errors from being thrown? For a non-volatile read cache, there is no way I'm getting approval to pay $2,000 per 200GB SSD: http://www.dell.com/Accessories/us/en/RC1338595/Product/Details/342-5631
What about any two 256GB Dell branded SSDs? Anyone have a list of SSDs that will not cause the controller to throw an error?
roblmc
2 Posts
0
December 12th, 2013 08:00
I have five of these drives in an H700 PERC. There are the 512GB and are in a RAID 5. Three are running DXM05B0Q as FW and two are running DXM04B0Q as firmware.
I've had the controller lose connectivity with them and take them offline several times in the 90+ days I've had them installed. A reboot resolves the problems There seems to be a 3 week window where they are good and then the PERC hiccups.
I'm planning to install the new firmware EXT0BB6Q that Samsung released. Also plan to update the FW on the PERC which is fairly old.
Interestingly some of the drives show an amber light when installed and some don't. In OpenManage Server Administrator they all show as degraded but with non-critical errors.
If you have had any success with getting them recognized by the PERC, I would like to know.
Here's an example of what I see for one drive.
Trickery
5 Posts
0
December 12th, 2013 09:00
Use Intel 530's or any Intel SSD instead of Samsung SSD drives and it'll resolve your issues. That's what I did and I just re-purposed the Samsung 840 Pros for desktop use. They are running solid, no errors at all.
Read this: http://community.spiceworks.com/topic/352646-dell-cachecade-and-r720xd-flex-bay?page=2
GreggBooth
2 Posts
1
February 21st, 2014 19:00
Dell has a server hard drive sales policy that I find seriously flawed. I get the need for certifying hardware so it works and adding some costs since you generally warranty them with the server. However, the only SSD drive I found after two hours of searching Dell.com and making separate calls to Sales and Support for my R710 server was a single SSD drive priced at $3500.
I solved the issue by creating a new rack server order on Dell.com and giving the "exact description of the SSD drive in my shopping cart" to the sales person on the phone. This way I was able to choose between the following two drives with about 20% markup over the price with a new machine.
* 200GB SSD SATA MLC 3Gbps 2.5in hot-plug drive with customer kit for $646
* 400GB SSD SATA MLC 3Gbps 2.5in hot-plug drive with customer kit for $1264.
I went through this "non-certified" drive issue with Dell a few years ago with drives on the 'certified list' that I purchased from a 3rd party. After a long battle with support I discovered that the H700 'disconnected' them because they were not sold to me by Dell. I ended up wasting a lot of time before finally returning the drives and repurchasing the same ones from Dell for double the cost. The funny thing is that I really didn't care about the purchase price at the time. I did it to save time since I can order drives from a 3rd party in 20 minutes (or less online) and avoid the 1-2 hours it always takes when ordering parts from Dell Sales.
If anyone from Dell is listening, please
* list certified SSD drives on the Dell shopping site searches and include some with hot swap caddies. * give the sales team access to the SSD list
In a perfect world you might even publish a list of certified SSD models for servers. ;-)