A RAID group can not have more than 16 disks - that's the limit. Coming to your question - The RAID Group and LUN layout is more like a design consideration and highly based on your requirement and environment - it'll not be easy to provide any straight forward answers here - hope you understand
I 'll recommend to go through the "EMC CLARiiON Best Practices for Performance and Availability - Applied Best Practice" document for all these details and more guidelines - here is the Document for Release 29. You can also get in touch with your local EMC contact person for more help.
Do you know the history of why it was designed like that ? It must have been designed with performance considerations in mind so by going to raid-6 you may not necessarily meet performance requirements (dual parity overhead)
It was designed by admins that didn't fully understand what a SAN was, so they modeled it after the RAID sets they would place in a rack based server with local disk. Most all of the RAID sets are fully allocated to a single LUN. To answer your question, there is really no rhyme or reason to the way it is now. At some point someone told them that the CLARiiONs were optimized for 5 and 9 disk RAID 5 sets, so that's what was primarily used. There has been no I/O sizing or projections performed, so some LUNs are way over subscribed and some way under.
I am getting ready to perform a map of RAID groups to the new SAN and will be looking at total I/O usage of each, but I just need to know if I should target a few large RAID sets or many smaller ones.
A SAN = Storage Area Network. So a SAN is a network ! Just like a LAN in fact. It's made of switches connected to each other and connected to these switches (fabrics) are hosts and storage arrays.
So when you talk about a SAN and disks, you're leaving the storage part out, which is equally important !
Host - HBA - SAN - Storage Array
Host = server
HBA = a sort of SCSI controller with "tunnels" SCSI commands over the fibre channel protocol; physically using fiber cables
SAN = fibre channel switches
Storage Array = one or more 19 inch racks with some intelligence and disks (Clariion, Symmetrix / DMX / V-Max)
Did I say something to indicate that I don't understand what a SAN is? While it is a network, it is a very specialized fiber channel network. I don't think there is anyone that doesn't understand what I mean when I refer to a SAN (specialized storage network) containing LUNs and RAID groups. Your post was not helpful.
By the way, HBA = Host Bus Adapter and it should be defined more as a NIC for the SAN, not a SCSI controller, that functionality is held in the storage processors.
My original question still really hasn't been answered. Are there major issues with designing a storage infrastructure with very large RAID 6 groups?
> Did I say something to indicate that I don't understand what a SAN is?
Yes, by saying your SAN has disks.
> By the way, HBA = Host Bus Adapter and it should be defined more as a NIC for the SAN, not a SCSI controller, that functionality is held in the storage processors.
An HBA nowadays is a definition nowadays used primarily for FC cards, but the exact translation is that it's a controller that put's I/O's on some sort of cable in order to reach the disks. This is what a SATA controller does, but also what a SAS, SCSI, ATA or FC controler does. In our case, speaking of SAN attached storage, an HBA is a SCSI controller. The host sees disks (= LUN's, just like in the old days with DAS storage where each physical disk was also called LUN) and addresses these disks using the SCSI protocol. Therefor a FC HBA is simply a SCSI controller.
A SAN is in fact a large SCSI network, where HBA's and scsi targets are managed by zones to avoid that too many scsi id's end up seeing eachother ! In fact HPUX still "suffers" from the old days when LUN's could to be addressed by only 3 bits, so LUN 0 - 7 were visible, LUN 8 wasn't. We're still seeing the same in Symmetrix environments, where you need to consider using the right LUN addresses when you're attaching them to HPUX. It's a SCSI world out there and we're the players ! Don't ever forget that.
And you are right, HBA's talk to storage ports, storage processors if you want and they address trhe actual physical disks for the host. The host never sees the actual physical disks, but that doesn't mean an HBA is not a SCSI controller.
IIf you're such a professional, you know not to call a storage array a SAN, because it's using the wrong words to simplify the story for people who don't understand. Don't do that. People will know what you mean when you say storage array. If we all keep on calling an array a SAN, they will never learn. When I go to a car repair shop because there's a problem with my injection system in my engine, I don't tell him there's an issue with my fuel lines. It only confuses people.
First of all, you are in the position of designing for what you have (or bought or were sold) instead of acquiring what you need. If the partner/EMC did some sort of performance analysis, I'd start with that. Allocate for performance the critical elements, and see what you have left over. I wouldn't limit myself to SAN copy as you likely have some allocated-but-not-used space from the previous "design". OS disk offsets, allocation units that match the application, etc, can pay big dividends in the end if performance is paramount. This can get ugly and be a lot more work but in the end will suit you better. You may find out you have plenty of space left, you may not - but making informed and intelligent choices is always the right way to go whether you are buying a car or designing your RAID groups
As far as having very large RAID6 groups, if your concern is your exposure to data loss during hot-spare replacement, you are barking up the right tree. Just be aware that striping (i.e. metaluns) a high number of LUNs across a small number of RAID6 groups can yield unpredictable performance. Alternatively, using thin provisioning may be what you are really need as it more or less automates much of the process. There are, however, some caveats to thin provisioning that are application-dependent so do your homework. Of course there is FAST...Like I used to tell my clients when I was a consultant....The good news is, there are a lot of right answers. The bad news is, there are a lot of right answers.
So, trying to ignore some of the points that have been flying back and forth since your initial question I just going to address the "spirit" of the question itself with some things you should consider carefully in designing the layout for the new array.
First things first, you should absolutely read both the "Performance and Availability Best Practices" and the "EMC CLARiiON Storage System Fundamentals for Performance and Availability — Applied Best Practices" documents. These will give you the foundation you need to start evaluating how to proceed.
To directly address the question of very large RAID 6 RGs, as was already pointed out... you can't make a RAIF group larger than 16 drives. After reading through the reference documents you will find that in most cases you are likely better off sticking to RAID 6 groups of 8 to 12 drives. In our environment we settled on making them all 10 drives (and we only use RAID 6 for lower performance requirements using large drives).
The other thing that concerns me about your proposed approach is that you identified a mix of drive types purchased ranging from low to exceptional performance capability. One would tend to assume that a mix like that was purchased to support multiple performance needs. It would be somewhat of a shame to treat them all the same and lose some of the advantage of the higher performance drives by not maximizing their configuration.
Again it all comes back to carefully reading the two documents I mentioned as an absolute starting point. In my opinion no one should be configuring an array without reading these docs, and they probably should be included with every array sold. But that is just my opinion.
Once you have reviewed those docs it will likely help you fine tune your plans to the point where it can be discussed and refined further here. I can practically guarantee that the people you are most likely to get help from here have all read these docs (and many of the previous iterations of them) and will refer to them as evidence for specific recommendations.
I'm just speaking in generalities - if you have two simultaneous drive failues (unlikely) with RAID6, you can recover. Two in a RAID5 is another story. Assuming you have a hotspare in both cases (RAID5 or RAID6), R6 would be superior as during the rebuild of one drive another drive could be lost (even more unlikely). As the time to rebuild to a hotspare grows, the window of vulnerability grows. I haven't tested to see how long it takes for a spare to kick in with a large 1 or 2TB-based RG, but I'm sure it is not too swift.
I don't really have a specific case in mind - this may seem rudimentary, but placing multiple LUNs (whether they are presented to the same host or a different host, whether they are striped or not) has the potential for "unpredictable" performance because narrowly viewed host-based performance counters don't tell you that another server is consuming all of the IO, etc, etc. Nothing fancy and perhaps stating the obvious - but not everyone has the level of expertise that you and many others around here have.
There are too many unanswered questions, in my opinion, to give any reasonable detailed technical answer.
In our environment we settled on making them all 10 drives (and we only use RAID 6 for lower performance requirements using large drives).
Allen, the ten-drive (8+2) is a particularly good configuration for RAID 6. Its 'square' (like the five-drive 4+1), which assuming 16KB cache pages abets full-stripe writes. This can be an advantage if your workload has any significant sequential component.
I believe this is discussed in the Fundimental's paper in the section dealing with Full-stripe Writes.
As far as having very large RAID6 groups, if your concern is your exposure to data loss during hot-spare replacement, you are barking up the right tree. Just be aware that striping (i.e. metaluns) a high number of LUNs across a small number of RAID6 groups can yield unpredictable performance.
Dan, I'm not sure I understand your points here.
A RAID 6 group with double drive failure protection would have the same protection as a RAID 5 group with single drive failure protection during a hot-spare replacement.
I think that a high number of LUNs bound to a small number of RAID 6 groups in a metaLUN yielding unpredictable performance is a highly qualified statement. Either linked contention or drive contention become possible when two or more LUNs share the same RAID groups no matter what the RAID level or use in a metaLUN.
FLARE 29 allows for 256 LUNs/RAID group. Two is the smallest number of RAID 6 groups that can implement a striped metaLUN. I could see where unpredictable metaLUN performance could occur from a two component LUN metaLUN, where each of the component LUNs is bound to a RAID group (of any level) with 255 peer LUNs. However, I think you have a specific case in mind?
I haven't tested to see how long it takes for a spare to kick in with a large 1 or 2TB-based RG, but I'm sure it is not too swift.
You have to consider two cases, proactive-hot sparing and a 'hard failure'. In the case where the drive is detected early to be failing, proactive hot-sparing results in seamless transfer of a failing drive's contents to a hot spare and a 'cut-over'. An out-of-the-blue failure is different. In the Rebuild section of the EMC CLARiiON Best Practices for Performance and Availability, FLARE Revision 29, is the calculation and an example on how to answer your question.
There are too many unanswered questions, in my opinion, to give any reasonable detailed technical answer.
You should give both EMC CLARiiON Best Practices and EMC CLARiiON Storage System Fundamentals for Performance and Availability a read. Fundamentals' more than Best Practices in broad strokes exposes the things you should be thinking about in provisioning.
First of all, I don't think you are picking on me. Perhaps you don't understand that I'm not the originator of the question!
I think we all agree that reading the documentation is the right thing to do and I don't see where I've disagreed with you. Proactive hot sparing is yet-another-feature that is great about the Clariion but out of the blue failures do occur (ever had to implement a Clariion in a building that is under construction? I know it isn't a good idea but unfortunately I've been there and done that despite my objections). As I said before (perhaps not clearly), RAID6 provides a higher level of protection compared to RAID5 should the unlikely event of simultaneous drive failure occur. I think you can say that you agree with that?
On second thought, perhaps you are trying to pick on me
Thank you for all the helpful answers. I think I've gotten everything I need. Mainly, I didn't realize there was a 16 disk limitation to the RAID sets on the EMC arrays. Is there a reason for this, technical or otherwise? I believe that I have a very good idea how I want to build-out the new system following the documentation provided.
nandas
4 Operator
•
1.5K Posts
0
March 9th, 2010 10:00
A RAID group can not have more than 16 disks - that's the limit. Coming to your question - The RAID Group and LUN layout is more like a design consideration and highly based on your requirement and environment - it'll not be easy to provide any straight forward answers here - hope you understand
I 'll recommend to go through the "EMC CLARiiON Best Practices for Performance and Availability - Applied Best Practice" document for all these details and more guidelines - here is the Document for Release 29. You can also get in touch with your local EMC contact person for more help.
http://powerlink.emc.com/km/live1/en_US/Offering_Technical/White_Paper/h5773-clariion-best-practices-performance-availability-wp.pdf
Thanks,
Sandip
Here are some comments from the above mentioned document on RAID 6 groups - sorry for any issue with the display of this.
dynamox
9 Legend
•
20.4K Posts
1
March 8th, 2010 17:00
Do you know the history of why it was designed like that ? It must have been designed with performance considerations in mind so by going to raid-6 you may not necessarily meet performance requirements (dual parity overhead)
SunHealth
6 Posts
0
March 8th, 2010 17:00
It was designed by admins that didn't fully understand what a SAN was, so they modeled it after the RAID sets they would place in a rack based server with local disk. Most all of the RAID sets are fully allocated to a single LUN. To answer your question, there is really no rhyme or reason to the way it is now. At some point someone told them that the CLARiiONs were optimized for 5 and 9 disk RAID 5 sets, so that's what was primarily used. There has been no I/O sizing or projections performed, so some LUNs are way over subscribed and some way under.
I am getting ready to perform a map of RAID groups to the new SAN and will be looking at total I/O usage of each, but I just need to know if I should target a few large RAID sets or many smaller ones.
RRR
4 Operator
•
5.7K Posts
0
March 9th, 2010 00:00
Aaaaaargh.... 1 hint:
A SAN = Storage Area Network. So a SAN is a network ! Just like a LAN in fact. It's made of switches connected to each other and connected to these switches (fabrics) are hosts and storage arrays.
So when you talk about a SAN and disks, you're leaving the storage part out, which is equally important !
Host - HBA - SAN - Storage Array
Host = server
HBA = a sort of SCSI controller with "tunnels" SCSI commands over the fibre channel protocol; physically using fiber cables
SAN = fibre channel switches
Storage Array = one or more 19 inch racks with some intelligence and disks (Clariion, Symmetrix / DMX / V-Max)
SunHealth
6 Posts
0
March 9th, 2010 09:00
Did I say something to indicate that I don't understand what a SAN is? While it is a network, it is a very specialized fiber channel network. I don't think there is anyone that doesn't understand what I mean when I refer to a SAN (specialized storage network) containing LUNs and RAID groups. Your post was not helpful.
By the way, HBA = Host Bus Adapter and it should be defined more as a NIC for the SAN, not a SCSI controller, that functionality is held in the storage processors.
My original question still really hasn't been answered. Are there major issues with designing a storage infrastructure with very large RAID 6 groups?
kelleg
4 Operator
•
4.5K Posts
1
March 9th, 2010 12:00
Another very good reference:
EMC CLARiiON Storage System Fundamentals for Performance and Availability
http://powerlink.emc.com/km/live1/en_US/Offering_Technical/White_Paper/H1049_emc_clariion_fibre_channel_storage_fundamentals_ldv.pdf
glen
RRR
4 Operator
•
5.7K Posts
0
March 11th, 2010 01:00
> Did I say something to indicate that I don't understand what a SAN is?
Yes, by saying your SAN has disks.
> By the way, HBA = Host Bus Adapter and it should be defined more as a NIC for the SAN, not a SCSI controller, that functionality is held in the storage processors.
An HBA nowadays is a definition nowadays used primarily for FC cards, but the exact translation is that it's a controller that put's I/O's on some sort of cable in order to reach the disks. This is what a SATA controller does, but also what a SAS, SCSI, ATA or FC controler does. In our case, speaking of SAN attached storage, an HBA is a SCSI controller. The host sees disks (= LUN's, just like in the old days with DAS storage where each physical disk was also called LUN) and addresses these disks using the SCSI protocol. Therefor a FC HBA is simply a SCSI controller.
A SAN is in fact a large SCSI network, where HBA's and scsi targets are managed by zones to avoid that too many scsi id's end up seeing eachother ! In fact HPUX still "suffers" from the old days when LUN's could to be addressed by only 3 bits, so LUN 0 - 7 were visible, LUN 8 wasn't. We're still seeing the same in Symmetrix environments, where you need to consider using the right LUN addresses when you're attaching them to HPUX. It's a SCSI world out there and we're the players ! Don't ever forget that.
And you are right, HBA's talk to storage ports, storage processors if you want and they address trhe actual physical disks for the host. The host never sees the actual physical disks, but that doesn't mean an HBA is not a SCSI controller.
IIf you're such a professional, you know not to call a storage array a SAN, because it's using the wrong words to simplify the story for people who don't understand. Don't do that. People will know what you mean when you say storage array. If we all keep on calling an array a SAN, they will never learn. When I go to a car repair shop because there's a problem with my injection system in my engine, I don't tell him there's an issue with my fuel lines. It only confuses people.
DanJost
190 Posts
0
March 11th, 2010 08:00
First of all, you are in the position of designing for what you have (or bought or were sold) instead of acquiring what you need. If the partner/EMC did some sort of performance analysis, I'd start with that. Allocate for performance the critical elements, and see what you have left over. I wouldn't limit myself to SAN copy as you likely have some allocated-but-not-used space from the previous "design". OS disk offsets, allocation units that match the application, etc, can pay big dividends in the end if performance is paramount. This can get ugly and be a lot more work but in the end will suit you better. You may find out you have plenty of space left, you may not - but making informed and intelligent choices is always the right way to go whether you are buying a car or designing your RAID groups
As far as having very large RAID6 groups, if your concern is your exposure to data loss during hot-spare replacement, you are barking up the right tree. Just be aware that striping (i.e. metaluns) a high number of LUNs across a small number of RAID6 groups can yield unpredictable performance. Alternatively, using thin provisioning may be what you are really need as it more or less automates much of the process. There are, however, some caveats to thin provisioning that are application-dependent so do your homework. Of course there is FAST...Like I used to tell my clients when I was a consultant....The good news is, there are a lot of right answers. The bad news is, there are a lot of right answers.
Hope this helps and good luck
Dan
Allen Ward
4 Operator
•
2.1K Posts
1
March 11th, 2010 11:00
So, trying to ignore some of the points that have been flying back and forth since your initial question I just going to address the "spirit" of the question itself with some things you should consider carefully in designing the layout for the new array.
First things first, you should absolutely read both the "Performance and Availability Best Practices" and the "EMC CLARiiON Storage System Fundamentals for Performance and Availability — Applied Best Practices" documents. These will give you the foundation you need to start evaluating how to proceed.
To directly address the question of very large RAID 6 RGs, as was already pointed out... you can't make a RAIF group larger than 16 drives. After reading through the reference documents you will find that in most cases you are likely better off sticking to RAID 6 groups of 8 to 12 drives. In our environment we settled on making them all 10 drives (and we only use RAID 6 for lower performance requirements using large drives).
The other thing that concerns me about your proposed approach is that you identified a mix of drive types purchased ranging from low to exceptional performance capability. One would tend to assume that a mix like that was purchased to support multiple performance needs. It would be somewhat of a shame to treat them all the same and lose some of the advantage of the higher performance drives by not maximizing their configuration.
Again it all comes back to carefully reading the two documents I mentioned as an absolute starting point. In my opinion no one should be configuring an array without reading these docs, and they probably should be included with every array sold. But that is just my opinion.
Once you have reviewed those docs it will likely help you fine tune your plans to the point where it can be discussed and refined further here. I can practically guarantee that the people you are most likely to get help from here have all read these docs (and many of the previous iterations of them) and will refer to them as evidence for specific recommendations.
DanJost
190 Posts
0
March 11th, 2010 12:00
I'm just speaking in generalities - if you have two simultaneous drive failues (unlikely) with RAID6, you can recover. Two in a RAID5 is another story. Assuming you have a hotspare in both cases (RAID5 or RAID6), R6 would be superior as during the rebuild of one drive another drive could be lost (even more unlikely). As the time to rebuild to a hotspare grows, the window of vulnerability grows. I haven't tested to see how long it takes for a spare to kick in with a large 1 or 2TB-based RG, but I'm sure it is not too swift.
I don't really have a specific case in mind - this may seem rudimentary, but placing multiple LUNs (whether they are presented to the same host or a different host, whether they are striped or not) has the potential for "unpredictable" performance because narrowly viewed host-based performance counters don't tell you that another server is consuming all of the IO, etc, etc. Nothing fancy and perhaps stating the obvious - but not everyone has the level of expertise that you and many others around here have.
There are too many unanswered questions, in my opinion, to give any reasonable detailed technical answer.
Dan
jps00
2 Intern
•
392 Posts
0
March 11th, 2010 12:00
In our environment we settled on making them all 10 drives (and we only use RAID 6 for lower performance requirements using large drives).
Allen, the ten-drive (8+2) is a particularly good configuration for RAID 6. Its 'square' (like the five-drive 4+1), which assuming 16KB cache pages abets full-stripe writes. This can be an advantage if your workload has any significant sequential component.
I believe this is discussed in the Fundimental's paper in the section dealing with Full-stripe Writes.
jps00
2 Intern
•
392 Posts
0
March 11th, 2010 12:00
As far as having very large RAID6 groups, if your concern is your exposure to data loss during hot-spare replacement, you are barking up the right tree. Just be aware that striping (i.e. metaluns) a high number of LUNs across a small number of RAID6 groups can yield unpredictable performance.
Dan, I'm not sure I understand your points here.
A RAID 6 group with double drive failure protection would have the same protection as a RAID 5 group with single drive failure protection during a hot-spare replacement.
I think that a high number of LUNs bound to a small number of RAID 6 groups in a metaLUN yielding unpredictable performance is a highly qualified statement. Either linked contention or drive contention become possible when two or more LUNs share the same RAID groups no matter what the RAID level or use in a metaLUN.
FLARE 29 allows for 256 LUNs/RAID group. Two is the smallest number of RAID 6 groups that can implement a striped metaLUN. I could see where unpredictable metaLUN performance could occur from a two component LUN metaLUN, where each of the component LUNs is bound to a RAID group (of any level) with 255 peer LUNs. However, I think you have a specific case in mind?
jps00
2 Intern
•
392 Posts
0
March 12th, 2010 08:00
Josh, not to pick on you, but:
I haven't tested to see how long it takes for a spare to kick in with a large 1 or 2TB-based RG, but I'm sure it is not too swift.
You have to consider two cases, proactive-hot sparing and a 'hard failure'. In the case where the drive is detected early to be failing, proactive hot-sparing results in seamless transfer of a failing drive's contents to a hot spare and a 'cut-over'. An out-of-the-blue failure is different. In the Rebuild section of the EMC CLARiiON Best Practices for Performance and Availability, FLARE Revision 29, is the calculation and an example on how to answer your question.
There are too many unanswered questions, in my opinion, to give any reasonable detailed technical answer.
You should give both EMC CLARiiON Best Practices and EMC CLARiiON Storage System Fundamentals for Performance and Availability a read. Fundamentals' more than Best Practices in broad strokes exposes the things you should be thinking about in provisioning.
DanJost
190 Posts
0
March 12th, 2010 09:00
First of all, I don't think you are picking on me. Perhaps you don't understand that I'm not the originator of the question!
I think we all agree that reading the documentation is the right thing to do and I don't see where I've disagreed with you. Proactive hot sparing is yet-another-feature that is great about the Clariion but out of the blue failures do occur (ever had to implement a Clariion in a building that is under construction? I know it isn't a good idea but unfortunately I've been there and done that despite my objections). As I said before (perhaps not clearly), RAID6 provides a higher level of protection compared to RAID5 should the unlikely event of simultaneous drive failure occur. I think you can say that you agree with that?
On second thought, perhaps you are trying to pick on me
SunHealth
6 Posts
0
March 12th, 2010 12:00
Thank you for all the helpful answers. I think I've gotten everything I need. Mainly, I didn't realize there was a 16 disk limitation to the RAID sets on the EMC arrays. Is there a reason for this, technical or otherwise? I believe that I have a very good idea how I want to build-out the new system following the documentation provided.