Unsolved

This post is more than 5 years old

43 Posts

6540

July 2nd, 2010 10:00

VMax Deferred Maintenance

We have "deferred maintenance" enabled on a VMax. I am not very familiar with this option. It is my understanding that with "deferred maintenance" enabled EMC will not come out and replace failed drives until there are 0 spares available for a particular drive type.
For example:
If I had 8 x 300GB FC spare drives and 2 x 300GB drives failed EMC would NOT come out and replace the failed drives because 6 spares are still avaliable.
If I had 8 x 300GB FC spare drives and 8 x 300GB drives failed EMC would come out and replace the drives because 0 spares are now avaliable.
Am I understanding this correctly? Can this theshold be changed so EMC for example replaces the drives when there are only 3 spares available?

1.3K Posts

July 2nd, 2010 11:00

The idea is to put a bunch of spares in the box and send the CE to harvest the bad ones when the spare count gets low.  I'm not sure what "low" is, but I might want them to come out when there was at least one left. 

6 Operator

 • 

2.1K Posts

July 2nd, 2010 11:00

I'm not familiar with this "deferred maintenance" option you are asking about, but I am anxious to hear the answer if anyone else knows what this is. When you say it is "enabled on the VMax" what exactly do you mean by that. Is it an option you see somewhere in one of the interfaces? If so it doesn't make a lot of sense that it would relate to when EMC comes to perform service. Actually the way you describe it doesn't make a lot of sense to me in general. I can't imagine what the purpose would be of not having them come in until you were out of hot spares. If that were the case there wouldn't be a lot of point in having multiples because you would always eventually end up in a state where there was more risk than there should be.

Anyway, hopefully someone can comment on what this is and put all our minds at ease :-)

131 Posts

July 2nd, 2010 13:00

What is advantage to customer?

1.3K Posts

July 2nd, 2010 14:00

Less touch points on the system.  CE is more rested.

Less cost to EMC which hopefully would result in lower costs to the customer.

6 Operator

 • 

2.8K Posts

July 4th, 2010 11:00

Less touch points on the system.  CE is more rested.


It looks like things changed a lot ... Now EMC wants "rested" CEs ??

108 Posts

July 4th, 2010 19:00

Hi Stefano,

It is very nice to hear from you. Please refer to the similar thread https://community.emc.com/thread/102140?tstart=0 for more details on deferred maintenance. This thread was on DMX, but it essentially the same feature on both products (except that VMAX does not have dynamic or "standard" sparing, just permanent). I know that Engineering are still planning to adjust the threshold so EMC gets a dial home before the spares pool is depleted. I will ask the EMC Corporate System Engineer responsible for deferred maintenance to respond to this thread with the latest advice.

Best Regards,

Michael.

43 Posts

July 6th, 2010 11:00

Thank you for the responses. Primus emc212019 is helpful and looks to support my statements below and examples in my first post.

- Currently EMC will not automatically go out and replace the failed drives until the spares are completely depleted for that particular drive type. There is a time between the last disk failure and CE replacing the drives when you are running with no spares.

- Currently there is no way to change this threshold except to disable "deferred maintenance".

July 7th, 2010 05:00

Did the replies to your post answer your questions? If so, please mark the helpful and correct posts and also mark the post as answered. Doing so helps others identify the most useful posts. Thank you.

43 Posts

July 7th, 2010 15:00

Thank you for the explanation. I am confused by the statement "no additional risk was added by not replacing the spares immediately but waiting for the "any drive in the system has zero spare coverage" ". Aren't any drives that spare from the depleted pool at increased risk? If another drive fails that spares from that pool it has no spare and is at risk until the EMC CE comes out the next day.

2 Intern

 • 

448 Posts

July 8th, 2010 05:00

No they are not at risk.  When there is no permanent spare available for any drive in the array a case is generated for replacement of the failed disk drives.  There can still be spares in the array that can be used as a stadnard hot spare just not a permanent spare.

43 Posts

July 8th, 2010 08:00

Is this true even on the VMax which only does permanent sparing?

2 Intern

 • 

448 Posts

July 8th, 2010 10:00

Thank you for reminding about VMAX sparing.  Sometimes I think I deal with too many products.

6 Operator

 • 

2.1K Posts

July 8th, 2010 10:00

So, things are much clearer than they were when this thread started, but I too have a difficult time allowing the assesment of "no additional risk" in allowing a zero hot spare available state.

If there is no risk in allowing a "Zero Hot Spare available" state then why would anyone waste money on putting in hot spares in the first place?

I can understand that the difference might be minimal, but it is ABSOLUTELY greater than the risk involved if Hot Spares ARE available. I could never justify spending the money on a highly redundant highly available system like the Symmetrix and then purposely allow a state where there is no immediately available redundancy if there was an option.

30 Posts

July 8th, 2010 16:00

I am glad to see this being reviewed again. Coincidentally I am in the process of reading "The Black Swan - The Impact of the Highly Improbable" by Nassim Taleb - http://en.wikipedia.org/wiki/Black_swan_theory.

After the previous discussions, our recommendation (which our clients have readily agreed to) has been to NOT enable deferred service. Allen W has a good point - `If there is no risk in allowing a "Zero Hot Spare available" state then why would anyone waste money on putting in hot spares in the first place?`

Following on from that, if `After very careful analysis it was found that the new process did not impose any additional risk` (quoting 2gnWoUR4Wl1186674120175), then why take the trouble to modify the microcode, so a `customer can chose (sic) to change the criteria from zero to one in which we call the spares out for replacement` ?

1 Rookie

 • 

49 Posts

July 24th, 2013 08:00

Hi, I would like to clarify one point here, Deferred maintenance will not wait for the spare count ot go down to 0 before calling the CE for replacement....

There is a concept of covering spare, Every drive has a covering spare drive assigned to it, In case that drive fails, the covering spare drive will take its place, this is not a 1-1 mapping, Onec spare will cover multiple disks. When enguinity detects that any disk does not have any covering spare it will call home and generate an SR for the CE to replace all spare drives which are pending replacement.

There is an AutoPse script that runs on the service processor, whenever a drive fails the AutoPSE script collects the RMA information for the drive ie serial number, part number etc. This script check is deffered disk service is enabled and if all drives have covering spares.

No Events found!

Top