Unsolved
This post is more than 5 years old
12 Posts
0
59341
February 25th, 2013 06:00
Any way to increase drive response timeout on H710p to avoid SSD dropping off array?
Recently, our company purchased a T620 server with dual H710p controllers, running Redhat 6.3 x64 .
I configured ten Plextor M5P 512GB SSDs as two RAID5 arrays with hotspares (one on each controller)
After several hours of continuous heavy random write load, eventually one or more of the drives drops off the controller.
A cold-boot brings the drive back as if it had never been gone.
I'd like some way of extending the timeout so the drive doesn't drop off the controller.
Any openmanage / MegaCLI setting that could fix this?
I have an idea that I could secure-erase all the SSDs, then hard-set them for 15% additional overprovisioning, then re-create the RAID5. Presumably that would improve max latency as well as the endurance, but it wouldn't guarantee that the drives wouldn't drop off the array.
Alternatively, we could replace the ten Plextor drives with 12 x 400GB Intel DC S3700 drives, but that would be rather expensive and could endanger my job.
Anyone have any experience with this?


DELL-Jonathan S
153 Posts
0
February 25th, 2013 08:00
I'm not specifically familiar with those drives and I don't know of an OpenManage or MegaCli option to change the drive's timeout, but I notice there is a new firmware for the M5P, 1.03, which came out this month and among other things claims to improve "drive stability under heavy loading". Based on the symptoms you reported my hope would be that is something you can apply and all your problems would go away. I found that at www.plextor-digital.com/.../index.php Let me know if that changes anything. If not we may want to pull a controller log and analyze it which can be done through OpenManage or MegaCli:
Creates /var/log/lsi_XXXX.log:
# omconfig storage controller action=exportlog controller=0
I'm not sure if the second command will replace or append the previous so please just copy the file before you run the second one:
# omconfig storage controller action=exportlog controller=1
(Your controller numbers might be different but it would be best to get both.)
# MegaCli -FwTermLog -Dsply -aALL > controller_log
The -aALL option should get both controllers' logs in one go.
Those commands produce the same log and whichever is more convenient for you to run (i.e. based on whatever management tool you have installed) works fine; it would be best to export the log shortly after one or more drives goes offline and before rebooting if possible. I can't promise anything in there will help us figure out what the problem is but if anything can it would most likely be that. If you need to pull the logs I'll send you my email address.
Let us know how it goes.
charles-crossix
12 Posts
0
February 26th, 2013 04:00
Thanks for the advice.
Interesting about the new firmware.
Do you know if it's possible to install the firmware using the MegaCli -PdFwDownload command? Or would I have to use some kind of USB connector and update the drives one by one from a windows PC ?
I have not been able to perform a MegaCli -PDInstantSecureErase on these drives, it doesn't seem to recognize that the drives support encryption, and without that I don't think it supports the secure erase. There are instructions for performing secure erase using hdparm, but I'm a little reluctant to try that without more confidence that it will work through a H710p controller. I'm also a little afraid of bricking one or more of these $450 drives.
Note - I don't have the server in front of me. It is located in a secure location seven timezones away, and getting onsite hands-on work is a time-consuming exercise in frustration. Anything I can manage myself through the OS or through iDRAC7 express is greatly preferred.
I generated the FwTermLog controller logs - how can I send them to your email?
DELL-Jonathan S
153 Posts
0
February 26th, 2013 09:00
Thanks for the reply; I emailed you at the address you have registered on the forums with. We can post a conclusion here when we reach one.
charles-crossix
12 Posts
0
March 25th, 2013 03:00
We updated the firmware, but the problem appeared to re-occur and two of the drives did not seem to recover.
We have downgraded the Array to a 6-disk RAID50 plus two hotspares
The write performance we are seeing now is extremely low - even for tasks which were very quick before.
e.g. H710p hardware Initialization of a 3-disk RAID5 is taking 8 hours instead of 40 minutes.
We are getting desperate to get adequate performance. Please give some useful suggestions.
DELL-Jonathan S
153 Posts
0
March 25th, 2013 09:00
It's possible these drives may not be suitable for use attached to a hardware RAID controller. Dell's officially supported SSDs include the following part numbers/descriptions:
342-3350 100GB Solid State Drive SATA Value MLC 3Gbps 2.5in Hot-plug Drive-Limited Warranty,CusKit
342-3351 100GB Solid State Drive SATA Value MLC 3Gbps 2.5in Hot-plug Drive,3.5in HYB CARR-Limited Warranty,CusKit
342-5631 200GB Solid State Drive SAS Value SLC 6Gbps 2.5in Hot-plug Drive,CusKit
342-5633 200GB Solid State Drive SAS Value SLC 6Gbps 2.5in Hot-plug Drive,3.5in HYB CARR,CusKit
342-3356 200GB Solid State Drive SATA Value MLC 3Gbps 2.5in Hot-plug Drive-Limited Warranty,CusKit
342-3357 200GB Solid State Drive SATA Value MLC 3Gbps 2.5in Hot-plug Drive,3.5in HYB CARR-Limited Warranty,CusKit
342-5636 400GB Solid State Drive SAS Value SLC 6Gbps 2.5in Hot-plug Drive,CusKit
342-5638 400GB Solid State Drive SAS Value SLC 6Gbps 2.5in Hot-plug Drive,3.5in HYB CARR,CusKit
342-5817 400GB Solid State Drive SATA Value MLC 3Gbps 2.5in Hot-plug Drive-Limited Warranty,CusKit
342-5821 800GB Solid State Drive SATA Value MLC 3Gbps 2.5in Hot-plug Drive-Limited Warranty,CusKit
342-5823 800GB Solid State Drive SATA Value MLC 3Gbps 2.5in Hot-plug Drive,3.5in HYB CARR-Limited Warranty,CusKit
Other than the non-certified drives I don't see any outstanding problems with your configuration. Non-certified drives are not necessarily problematic but from your logs nothing appears wrong so that is where I would look. So my next suggestion would be to see if the drive manufacturer has any recommendation about the configuration. Plextor's support email is at <www.plextor-digital.com/.../support.html>. Let me know if you need more information about the controller or if you learn anything interesting from Plextor.
charles-crossix
12 Posts
0
April 7th, 2013 08:00
Hi.
Can you tell me if 800GB Sata Value SSD is actually a re-badged Intel DC S3700 drive? I would think so, but it claims to be a 3Gbps drive, and the new Intel drives are 6Gbps. Will these drives work in a mysql environment with hardware RAID5 without losing performance? Will they require overprovisioning to avoid degrading their write performance levels, or are they already sufficiently overprovisioned?
charles-crossix
12 Posts
0
April 7th, 2013 08:00
Hi.
Can you tell me if 800GB Sata Value SSD is actually a re-badged Intel DC S3700 drive? I would think so, but it claims to be a 3Gbps drive, and the new Intel drives are 6Gbps. Will these drives work in a mysql environment with hardware RAID5 without losing performance? Will they require overprovisioning to avoid degrading their write performance levels, or are they already sufficiently overprovisioned?
Dev Mgr
6 Operator
•
9.3K Posts
0
April 7th, 2013 09:00
Hardware wise they may or may not be Intel drives, but the firmware will be specifically written to work with PERC adapters.
Non-Dell drives cannot be upgraded to this firmware though (a way for Dell to ensure they can make the money back that was spent to develop the firmware to work properly with their raid controllers), so it 'forces' you to buy Dell drives.
charles-crossix
12 Posts
0
April 8th, 2013 02:00
Ok, you seem to have misunderstood my question.
I'd like to recommend to my boss that the company purchase several of the large "SATA Value MLC" SSDs to replace these misbehaving drives. Dell's pricing on them is very reasonable if they are true 'Enterprise-class, datacenter ready' drives. However, the Dell site seems to be sorely lacking in details on the features of these drives. If the drives are desktop-class, requiring TRIM to maintain write performance, then I shouldn't recommend them for a database RAID. I'd like to know if they possess the features that make the new Intel DC S3700 drives so compelling.
1) Are they long endurance? (Intel claims ten full-drive writes of 100% random 4K blocks for five years for the DC S3700)
2) Do they have the low maximum write latency like the Intel DC S3700 drives?
3) Do they have consistent average write latency like the Intel DC S3700 drives?
4) Do they require additional over-provisioning to keep their write performance high for long-term use? Intel doesn't.
5) Can the drives be 'instant secure-erased' through the PERC controller BIOS?
If these drives have the advanced features of the new Intel SSDs, then I would not hesitate to recommend purchasing them to my boss.
charles-crossix
12 Posts
0
April 22nd, 2013 03:00
Status update:
I tested the plextor drives individually, and found that all but two were behaving extremely slowly, and that two drives were operating at full speed.
I had a secure-erase performed on all 9 working drives, tested them individually to prove that full speed had been restored, and then assembled them as a 12% overprovisioned RAID5.
So far they are working without a problem, but we plan on using it for a month before we declare the issue resolved.