Anyone else encountered the capped oversubscription of a 130% on a SRP on a VMAX3 yet?
Rather frustrating that you could oversubscribed on the VMAX1/2 pass a conservative 130%
Confirmed with EMC first level and local EMC contact(s). Still awaiting a Technical explanation which I might be able to digest but it appears the only fix is to throw more di$k at it.
To my knowledge, by default oversubscription limit is NONE in all VMAX. You have to set this limit using the command symconfigure -sid xx -cmd "set pool [pool name], type=thin, max_subs_percent=130;" commit
This command reflects a 130% max subscription limit.
First thing I tried.
This is really frustrating as you could set this in Unisphere managing a VMAX1/2, I think you were even able to set it back in SMC (Symmetric Management Console). Obviously a limitation of Hypermax OS that weren't in Enginuity.
Capping over-subscription to a 130% on a VMAX3 is absurd in large VMware environments when you don't thin the disks to vm's and implement a preference of maintaining 10% free capacity on each datastore. As VM's are getting larger so are the "standard" presented datastores and the limit might be heading your way quicker than you think.
From a technical standpoint I'm still trying to understand the limitation, it is almost if Hyprmax is encouraging going back to the days of thick provisioning i.e. pre-allocate devices from DAY 1 to take full advantage of RAW configured capacity in the SRP with SLO's taking care of performance. The only reason I can think of are SLO's. Why, there is no more reserved capacity as in FAST but rather usable capacity for each SLO. The only reservation (default 10%) on a SRP appears to be for Timefinder and DSE on the VMAX3.
As of the version of HYPERMAX OS released in Q1 2015, the over-provisioning value in the bin file has no limiting effect. The value is used when configuring new systems to make sure enough resources (cache, flash, disk) are configured to allow for creation of the size/count of TDEVs that will be needed. The default value is 130% (or 1.3x) but can be changed while designing the system in the configuration program. But regardless of what it is set to in the bin file, the TDEV capacity that can be created is only limited by the actual resources.
So if you are running that code or later and are unable to create any more TDEVs, you may be truly out of resources to do so. That's about as much info as I can provide in general. If you can provide more info like SR #, error message, running code, and code that the box was installed at, we may be able to provide some more specific info. You can send me a private message if you don't want to post site-specific info in the forum.
Hope that helps a little...
Thank you for the reply Michael, and please correct me if I interpret it incorrectly.
Does this implicate that it could be changed in the bin file and re-apply or can this only be done at initial configuration/upload of the bin file?
We are on Hypermax 5977.691.684 Here is a clearer view of the current state in regards to some actual resources consumed.
I agree 100% we are rather generous on provisioning disk and would appear that something like right-sizing is not considered, a major challenge in many organizations I think and we have many check and balances to offset that to not become an issue on the array although it would not appear that way at first glance. I base this statement on what we provisioned as oppose to what is actually written, 233.8TB – 129.32TB = 104TB over allocation.
I would like to think with our config. of 40TB EFD and 160TB SAS 10k disks that you would be able to push the SRP pool to 80% written (capped similar to VNX2) thereby taking the subscription pass 130%.
This means capping the SRP on actual written data rather than subscription without compromising performance. This would scale proportional by capping written as oppose to subscription leaving enough room not compromising performance.
There is no need to reset the value in the bin file. You can ignore it. It is not checked / obeyed by the device creation process. Have you already had device creation blocked/failed?
What is more significant is that is 130% was specified when the system was ordered, then there may only be enough HW in the system to support 130% OP.
The actual written data is not a factor in the over provisioning. Whenever a TDEV is created, it requires meta data in cache. We can only create so may TDEVs because there needs to be enough cache for read/write IO and other meta data
Yes,already had device creation blocked/failed, as a test, so purely a virtual device/tdev to push the subscription without any kind of SLO, actually set to None. Not added to any Storage Group either.
Interesting that you mention the word 'specified', is this something we as the customer was ask to specify when the system was ordered or a default specified in the code which implies there was or might have been a window of opportunity to get it changed.
I agree on the amount of tdevs in relation with the amount of cache for read/write I/O but in this case but I can't help comparing our single engine VMAX(1)20k with 70TB and 60GB cache with literally double the amount of tdevs overprovisioned way pass 130% without any performance degradation. Our Pre-Prod single engine VMAX(2)10k is even worse with only 4TB physically capacity left also with 60GB cache and not a single mention about poor performance against a dual engine VMAX(3)100k with 1TB cache (no SATA) and 200TB.
There is a default over subscription setting in the code, but that is no longer active (it has no effect). The provisioning on VMAX3 is all based on cache and metadata.
When the system is built, the EMC tools ensure that there is at least enough cache for your expected data activity (thresholds that have been developed over decades) and for the metadata to run the system. When we designed VMAX3, we did several things to clean up the metadata, so that we would have all of the information about a given 128k track in a single location. We set up the addressing for 512TB devices (48-bit track addressing), up to 1+M defined devices (24-bit device addressing), single tracks on disk (backend layouts for each 128k track to be at an independent thin location), etc. - all of which created a rather large metadata footprint. Our focus was on simplifying track management and driving up performance.
As a result of these changes, the old meta device constructs to build anything over 256GB are gone, snaps are very efficient (we do allocate on write), online device expansion is fast and simple, and many more benefits. The downside is that the metadata for each track is not small. And because of our design, it is all kept in memory. We have projects to be able to page the less active metadata, but those goals have not yet been achieved. And when a thin device is created, the designers wanted to ensure that we will always have the space for the metadata - so it is all created up front. The good news is that you never have a failure of "can't build metadata" when you write to some new track. The bad news is that the total TDEV capacity that you can create on the system is limited by the cache you have installed, even if you are not using large portions of that space.
Back to the system design tools, the default design is to plan for at least enough cache to over provision at 1.3:1 (that is 30% over provisioned). If you have hit that limit, the only options are to remove some thin devices and replace them with smaller ones or to add more cache to the system. Most of our customers are very conservative, so in the past there was very little over subscription on VMAX systems. The numbers are trending up, and we are working on ways to be able to reduce the metadata impact of such over provisioning.
I would be glad to work with you and your EMC sales team to understand your current configuration and your over subscription plans to be able to help you to resolve this issue.