I posted these issues recently on a SQL Server blog thread (see link: http://sqlblog.com/blogs/joe_chang/archive/2008/09/04/io-cost-structure-preparing-for-ssd-arrays.aspx). The thread author also tested and found poor write performance with RAID-0 virtual disks on the PERC-6i but not with the PERC-5, as did another poster to this thread. These posts suggest a broader issue rather than an isolated case involving just our system.
A side concern is that RAID-10, RAID-50, and RAID-60 use striping (RAID-0) on top of their respective base protection models (RAID-1, RAID-5, and RAID-6). Is it possible that RAID-10, RAID-50, and RAID-60 protection models may also be impacted by this RAID-0 write performance issue?
The PERC-6i firmware version is 6.0.2-0002 and the driver version is 2.14.00.32 on my test server. No similar problems were found on the Dell support site from a number of searches.
In contrast, the published findings from the PERC-6 RAID controller analysis white paper (the paper you mentioned in your post) for RAID-0 write workloads (web server log, SQL Server log) yielded much better performance – about what would be expected for RAID-0.
I am reporting this issue to Dell support for investigation and follow-up.
I sent an email over the author of the PERC 6 performance study that you referenced to see if he has any guidance. I believe that there was a firmware update for the PERC 6 - can you tell me what version you are using? - As I am sure this is one of the questions that he will ask me :)
The PERC-6i firmware version is 6.0.2-0002 and the driver version is 2.14.00.32 (as gathered by OMSA). I posted it in the second part of the message above - sorry if it was confusing.
My review of outstanding PERC-6 firmware versions on the Dell support site showed only one newer version - 6.0.3-0002, A05, dated 06/23/2008. The description of this update (problems and fixes) did not mention any RAID-0 write performance issues.
An update: I reported the issue to Dell support. After collecting and submitting a DSET report for the test server (as well as my IOMeter test results), I was told that nothing is "broke" - the server (including the PERC-6 RAID controller) reports that it is "working" (nothing is failing or causing explicit errors). The reported information on poor PERC-6 RAID-0 write performance will be "passed up the chain", but there is no formal feedback process on such support-related requests, and no time line estimate / approximation / etc. on resolution. I was told to create a technical update subscription for the server, so that I can be notified when a PERC-6 firmware update is available (which I have done). I asked if problem case records could be searched for similar problems, and was told they could not be searched. The case is closed, and that's all I have to show for it.
Any feedback from the staff member who authored the PERC-6 performance study?
As I mentioned in the second post above (continuation of the original post), I am aware of at least two other persons who say they have recreated the same PERC-6 RAID-0 poor write performance issue I experienced. Follow the blog link given above in that post for further details.
Sorry for the delayed response. I'm working on getting some help on the PERC performance specifics, the author of the paper you reference is out of the office currently.
I agree with you that it looks like some type of strange bug with RAID 0, but I haven't been able to get to the right people here yet to find out.
One question that I have after reading through everything - What type of storage are you attaching to? Is it an MD1000?
Scott - Here is the response that I got from our performance team:
We have never seen RAID 5 be better than RAID 0, especially for writes. However, using a file system, the writes go to the buffer cache, not the RAID controller. They are then written out by the OS lazy writer. You can setup the OS to force writes (flush), but that depends on how the system was setup. Here are a few issues:
1. He mentions the fileallocation size, so he is using a file system on the LUN. That is not how we run the tests since the buffering caching in the operating system can then interfere.
2. He mentions that this is a PERC6i, but doesn’t mention how he has set up the card (write through or write back).
3. He mentions 1x1, 2x2, 1x5 which I interpret as 1 disk without RAID, 2 sets of RAID 0 with 2 disks each and a single RAID 0 over five disks.
So, I would look at two things…
1. Look at the settings on the RAID controller… I would bet it is WT for RAID 0 and Write back for RAID 5. WT is better throughput with a reasonable queue depth. So with a single i/o queued and WT on for RAID 0 versus WB for RAID 5… it could run slower. The general rule is write back is best (lowest latency) unless you have a large queue depth and then WT could give more throughput.
2. Look at the filesystem buffer cache effects… are the two arrays treated differently.
I think that you already answered some of the questions with your config. Can you take a look at the cache settings and try IOmeter without filesystem?
The 10 x 146 GB SAS disks were all located in the internal drive bays of the PE 2900 - 4 drives on connector #1 and 6 drives on connector #2 of the PERC-6i. No MD1000s were used in this configuration for the tests. The 5-disk RAID-0 virtual disk used physical disks across both connectors (3 drives on one connector and 2 drives on the other connector) - a good practice to balance I/O bandwidth across the connectors.
Item #13 is an issue I have noticed but have not yet discussed here. I have followed the documentation for requesting a fast initialize on both RAID-5 and RAID-10 virtual disks (on both PERC-5 and PERC-6 RAID controllers), only to have the process take many hours – running what appears to be a regular (non-fast) background initialize and ignoring the request for a “fast initialize”. I can understand a RAID-5 initialize taking awhile, since the RAID-5 initialize process has to read all disks, and compute / rewrite parity. I would expect that RAID-10 initialize should be very fast – a data copy versus a multi-disk read / compute parity / write process for RAID-5. Yet I consistently experience slower initialize times (using regular or fast initialize) for RAID-10 virtual disks than RAID-5 virtual disks. I have even seen slower initialize times on a RAID-10 virtual disk with 4 disks total than a RAID-5 virtual disk with 6 disks total (same size / speed / interface disks in all cases). This behavior is completely counter-intuitive to what I would expect. I am hopeful that this firmware change may improve the situation, but I will reserve judgment until I can test it.
Have the folks you know in Dell’s test labs experienced any similar issues regarding virtual disk background initialize taking longer than expected?
Thanks for your interest in these issues. I look forward to your feedback.
Thanks for your reply. I had planned to send you more information to your questions, but have since learned that the problem is likely resolved with a recent firmware upgrade for the PERC-6 RAID controllers (6.1.1-0045, A07 – my tests were done on 6.0.2-0002).
See link: http://support.dell.com/support/downloads/download.aspx?c=us&l=en&s=gen&releaseid=R196813&SystemID=PWE_2900&servicetag=CP0YRG1&os=WNET&osl=en&deviceid=13514&devlib=0&typecnt=0&vercnt=4&catid=-1&impid=-1&formatcnt=5&libid=46&fileid=272112
See the following separate blog entries for further details on the original problem and on another person’s test results of the resolution:
For Raid 0 (not 10/50/60) there is a performance fix in the latest firmware posted to support.dell.com for the PERC 6 family. (as mentioned above)
6.1.1-0047
This fixes performance for Raid 0 arrays in WB mode.
BGI taking an excessive amount of time has also been fixed. There is a difference in how fast init/BGI and full init work on PERC 6. With SAS drive in particular Full Init will do copy functions that will speed up the overal process, but the downside is the array is locked while this is ongoing (and if you reboot you will fall back to a BGI).
There are also enhancements for SATA performance.
Hi Scott,
did you have a chance to confirm that RAID0 is now performing as expected? We are just about to get a couple of 2950s, which we will be using with the MD1000, but I'm wondering, whether to get the older PERC 5e with that to avoid the RAID 0 issues.
Also, would the new firrmware also be fixing performance in RAID 10 mode?
We are planning to use the new Intel X25-E SSD drives, but since they are only 64GB max, we definitely need RAID 0 for testing and eventually RAID 10 when we go into production.
Scott R 007
8 Posts
0
September 11th, 2008 12:00
(continued from earlier post)
I posted these issues recently on a SQL Server blog thread (see link: http://sqlblog.com/blogs/joe_chang/archive/2008/09/04/io-cost-structure-preparing-for-ssd-arrays.aspx). The thread author also tested and found poor write performance with RAID-0 virtual disks on the PERC-6i but not with the PERC-5, as did another poster to this thread. These posts suggest a broader issue rather than an isolated case involving just our system.
A side concern is that RAID-10, RAID-50, and RAID-60 use striping (RAID-0) on top of their respective base protection models (RAID-1, RAID-5, and RAID-6). Is it possible that RAID-10, RAID-50, and RAID-60 protection models may also be impacted by this RAID-0 write performance issue?
The PERC-6i firmware version is 6.0.2-0002 and the driver version is 2.14.00.32 on my test server. No similar problems were found on the Dell support site from a number of searches.
In contrast, the published findings from the PERC-6 RAID controller analysis white paper (the paper you mentioned in your post) for RAID-0 write workloads (web server log, SQL Server log) yielded much better performance – about what would be expected for RAID-0.
I am reporting this issue to Dell support for investigation and follow-up.
Any suggestions or insights from your side?
Thanks!
Scott R.
virtualTodd
112 Posts
0
September 11th, 2008 15:00
Todd
Scott R 007
8 Posts
0
September 12th, 2008 09:00
The PERC-6i firmware version is 6.0.2-0002 and the driver version is 2.14.00.32 (as gathered by OMSA). I posted it in the second part of the message above - sorry if it was confusing.
My review of outstanding PERC-6 firmware versions on the Dell support site showed only one newer version - 6.0.3-0002, A05, dated 06/23/2008. The description of this update (problems and fixes) did not mention any RAID-0 write performance issues.
Thanks for your help.
Scott R.
Scott R 007
8 Posts
0
September 24th, 2008 10:00
An update: I reported the issue to Dell support. After collecting and submitting a DSET report for the test server (as well as my IOMeter test results), I was told that nothing is "broke" - the server (including the PERC-6 RAID controller) reports that it is "working" (nothing is failing or causing explicit errors). The reported information on poor PERC-6 RAID-0 write performance will be "passed up the chain", but there is no formal feedback process on such support-related requests, and no time line estimate / approximation / etc. on resolution. I was told to create a technical update subscription for the server, so that I can be notified when a PERC-6 firmware update is available (which I have done). I asked if problem case records could be searched for similar problems, and was told they could not be searched. The case is closed, and that's all I have to show for it.
Any feedback from the staff member who authored the PERC-6 performance study?
As I mentioned in the second post above (continuation of the original post), I am aware of at least two other persons who say they have recreated the same PERC-6 RAID-0 poor write performance issue I experienced. Follow the blog link given above in that post for further details.
Scott
virtualTodd
112 Posts
0
September 24th, 2008 13:00
Sorry for the delayed response. I'm working on getting some help on the PERC performance specifics, the author of the paper you reference is out of the office currently.
I agree with you that it looks like some type of strange bug with RAID 0, but I haven't been able to get to the right people here yet to find out.
One question that I have after reading through everything - What type of storage are you attaching to? Is it an MD1000?
Thanks,
Todd
virtualTodd
112 Posts
0
September 25th, 2008 12:00
We have never seen RAID 5 be better than RAID 0, especially for writes. However, using a file system, the writes go to the buffer cache, not the RAID controller. They are then written out by the OS lazy writer. You can setup the OS to force writes (flush), but that depends on how the system was setup. Here are a few issues:
1. He mentions the fileallocation size, so he is using a file system on the LUN. That is not how we run the tests since the buffering caching in the operating system can then interfere.
2. He mentions that this is a PERC6i, but doesn’t mention how he has set up the card (write through or write back).
3. He mentions 1x1, 2x2, 1x5 which I interpret as 1 disk without RAID, 2 sets of RAID 0 with 2 disks each and a single RAID 0 over five disks.
So, I would look at two things…
1. Look at the settings on the RAID controller… I would bet it is WT for RAID 0 and Write back for RAID 5. WT is better throughput with a reasonable queue depth. So with a single i/o queued and WT on for RAID 0 versus WB for RAID 5… it could run slower. The general rule is write back is best (lowest latency) unless you have a large queue depth and then WT could give more throughput.
2. Look at the filesystem buffer cache effects… are the two arrays treated differently.
I think that you already answered some of the questions with your config. Can you take a look at the cache settings and try IOmeter without filesystem?
Thanks,
Todd
Scott R 007
8 Posts
0
September 25th, 2008 12:00
The 10 x 146 GB SAS disks were all located in the internal drive bays of the PE 2900 - 4 drives on connector #1 and 6 drives on connector #2 of the PERC-6i. No MD1000s were used in this configuration for the tests. The 5-disk RAID-0 virtual disk used physical disks across both connectors (3 drives on one connector and 2 drives on the other connector) - a good practice to balance I/O bandwidth across the connectors.
Scott R.
Scott R 007
8 Posts
0
October 1st, 2008 11:00
In a issue related to the previous post (continuation due to wiki post size limitations):
Under the section titled “Fixes and Enhancements”, the PERC-6 firmware upgrade documentation notes two specific changes of interest:
2. Improved performance for RAID 0 virtual disks.
13. Improved background initialization (BGI) performance.
Item #2 is the issue we have been discussing.
Item #13 is an issue I have noticed but have not yet discussed here. I have followed the documentation for requesting a fast initialize on both RAID-5 and RAID-10 virtual disks (on both PERC-5 and PERC-6 RAID controllers), only to have the process take many hours – running what appears to be a regular (non-fast) background initialize and ignoring the request for a “fast initialize”. I can understand a RAID-5 initialize taking awhile, since the RAID-5 initialize process has to read all disks, and compute / rewrite parity. I would expect that RAID-10 initialize should be very fast – a data copy versus a multi-disk read / compute parity / write process for RAID-5. Yet I consistently experience slower initialize times (using regular or fast initialize) for RAID-10 virtual disks than RAID-5 virtual disks. I have even seen slower initialize times on a RAID-10 virtual disk with 4 disks total than a RAID-5 virtual disk with 6 disks total (same size / speed / interface disks in all cases). This behavior is completely counter-intuitive to what I would expect. I am hopeful that this firmware change may improve the situation, but I will reserve judgment until I can test it.
Have the folks you know in Dell’s test labs experienced any similar issues regarding virtual disk background initialize taking longer than expected?
Thanks for your interest in these issues. I look forward to your feedback.
Scott R.
Scott R 007
8 Posts
0
October 1st, 2008 11:00
Thanks for your reply. I had planned to send you more information to your questions, but have since learned that the problem is likely resolved with a recent firmware upgrade for the PERC-6 RAID controllers (6.1.1-0045, A07 – my tests were done on 6.0.2-0002).
See link: http://support.dell.com/support/downloads/download.aspx?c=us&l=en&s=gen&releaseid=R196813&SystemID=PWE_2900&servicetag=CP0YRG1&os=WNET&osl=en&deviceid=13514&devlib=0&typecnt=0&vercnt=4&catid=-1&impid=-1&formatcnt=5&libid=46&fileid=272112
See the following separate blog entries for further details on the original problem and on another person’s test results of the resolution:
http://sqlblog.com/blogs/joe_chang/archive/2008/10/01/dell-perc6-raid-controller-performance.aspx
http://sqlblog.com/blogs/joe_chang/archive/2008/09/04/io-cost-structure-preparing-for-ssd-arrays.aspx
I plan to test the PERC-6 firmware upgrade when my test server is available again from its current assignment.
Have the folks you know in Dell’s test labs tested the issue – before and after fix?
Thanks for your interest in these issues. I look forward to your feedback.
Scott R.
dpapa
2 Posts
0
October 22nd, 2008 09:00
6.1.1-0047
This fixes performance for Raid 0 arrays in WB mode.
BGI taking an excessive amount of time has also been fixed. There is a difference in how fast init/BGI and full init work on PERC 6. With SAS drive in particular Full Init will do copy functions that will speed up the overal process, but the downside is the array is locked while this is ongoing (and if you reboot you will fall back to a BGI).
There are also enhancements for SATA performance.
w01fgang
7 Posts
0
November 11th, 2008 20:00
did you have a chance to confirm that RAID0 is now performing as expected? We are just about to get a couple of 2950s, which we will be using with the MD1000, but I'm wondering, whether to get the older PERC 5e with that to avoid the RAID 0 issues.
Also, would the new firrmware also be fixing performance in RAID 10 mode?
We are planning to use the new Intel X25-E SSD drives, but since they are only 64GB max, we definitely need RAID 0 for testing and eventually RAID 10 when we go into production.
Wolfgang.