Start a Conversation

Unsolved

This post is more than 5 years old

6299

July 2nd, 2010 10:00

VMax Deferred Maintenance

We have "deferred maintenance" enabled on a VMax. I am not very familiar with this option. It is my understanding that with "deferred maintenance" enabled EMC will not come out and replace failed drives until there are 0 spares available for a particular drive type.
For example:
If I had 8 x 300GB FC spare drives and 2 x 300GB drives failed EMC would NOT come out and replace the failed drives because 6 spares are still avaliable.
If I had 8 x 300GB FC spare drives and 8 x 300GB drives failed EMC would come out and replace the drives because 0 spares are now avaliable.
Am I understanding this correctly? Can this theshold be changed so EMC for example replaces the drives when there are only 3 spares available?

1.3K Posts

July 2nd, 2010 11:00

The idea is to put a bunch of spares in the box and send the CE to harvest the bad ones when the spare count gets low.  I'm not sure what "low" is, but I might want them to come out when there was at least one left. 

2.1K Posts

July 2nd, 2010 11:00

I'm not familiar with this "deferred maintenance" option you are asking about, but I am anxious to hear the answer if anyone else knows what this is. When you say it is "enabled on the VMax" what exactly do you mean by that. Is it an option you see somewhere in one of the interfaces? If so it doesn't make a lot of sense that it would relate to when EMC comes to perform service. Actually the way you describe it doesn't make a lot of sense to me in general. I can't imagine what the purpose would be of not having them come in until you were out of hot spares. If that were the case there wouldn't be a lot of point in having multiples because you would always eventually end up in a state where there was more risk than there should be.

Anyway, hopefully someone can comment on what this is and put all our minds at ease :-)

131 Posts

July 2nd, 2010 13:00

What is advantage to customer?

1.3K Posts

July 2nd, 2010 14:00

Less touch points on the system.  CE is more rested.

Less cost to EMC which hopefully would result in lower costs to the customer.

2.8K Posts

July 4th, 2010 11:00

Less touch points on the system.  CE is more rested.


It looks like things changed a lot ... Now EMC wants "rested" CEs ??

108 Posts

July 4th, 2010 19:00

Hi Stefano,

It is very nice to hear from you. Please refer to the similar thread https://community.emc.com/thread/102140?tstart=0 for more details on deferred maintenance. This thread was on DMX, but it essentially the same feature on both products (except that VMAX does not have dynamic or "standard" sparing, just permanent). I know that Engineering are still planning to adjust the threshold so EMC gets a dial home before the spares pool is depleted. I will ask the EMC Corporate System Engineer responsible for deferred maintenance to respond to this thread with the latest advice.

Best Regards,

Michael.

43 Posts

July 6th, 2010 11:00

Thank you for the responses. Primus emc212019 is helpful and looks to support my statements below and examples in my first post.

- Currently EMC will not automatically go out and replace the failed drives until the spares are completely depleted for that particular drive type. There is a time between the last disk failure and CE replacing the drives when you are running with no spares.

- Currently there is no way to change this threshold except to disable "deferred maintenance".

July 7th, 2010 05:00

Did the replies to your post answer your questions? If so, please mark the helpful and correct posts and also mark the post as answered. Doing so helps others identify the most useful posts. Thank you.

5 Practitioner

 • 

274.2K Posts

July 7th, 2010 09:00

Hi All,

I hope I can help with some of the confusion that is taking place related to Deferred Service as I am the EMC CS Global Service engineer who worked closely with the microcode development engineers to implement this process (It’s been available since March of 09) for DMX-3 / DMX-4 / VMAX systems.

The process works like this. When the "Enable Deferred Service" flag is set on the box, a drive fails and it is going to be permanently spared, the system will dial home with the drive failure. When the dial home is received, the call home infrastructure detects that the "Enable Deferred Service" is set on the box. A Service Request will be opened for the dial home but the SR will then be automatically canceled. After the permanent sparing script is completed, the microcode runs a check to verify if any drive in the system now has zero spares available to be used. If it detects that any drive in the system now has zero spares available then a second dial home will occur. This dial home will not only call out the spare that caused a drive to reach zero spares available for replacement but also any of the other spares in the system that have been previously deferred and not replaced. Depending on the spare drive configuration in the box the dial home may have a single spare drive needing replacing or it could include multiple spare drives requiring replacement. The CE will arrive on site Next Business Day with the spare drive / drives requiring replacing and using a single instance of the “Replace Spare Drive" script will replace the drives. 

Once again the key point is “any drive in the system has zero spare coverage" statement. This does not mean that all of the spare drives in the system are bad. Permanent sparing is configuration dependant so depending on where the failed drive lives and what size and type it is only certain spares can be used to permanently spare against it. There still can and most likely will be spares available in the system to be used as spares for other drives.

Please keep in mind very careful consideration was taken during the design phases of this process to insure no additional risk was added by not replacing the spares immediately but waiting for the "any drive in the system has zero spare coverage" criteria being met. After very careful analysis it was found that the new process did not impose any additional risk. Also if in the event that another drive failed, and no spare was available, an SR would be opened and a CE would be dispatched to replace the drive immediately.

As there have been several requests, we are currently working on adding an additional enhancement for the process in which a customer can chose to change the criteria from zero to one in which we call the spares out for replacement. This should be available in a future release of microcode.

As an FYI there are currently over 1400 DMX-3, DMX-4 and VMAX systems that have Deferred Service enabled with a very good success rate of drive replacements that have deferred.

I hope this helps with understanding the process. Feel free to ask additional questions.

2 Attachments

43 Posts

July 7th, 2010 15:00

Thank you for the explanation. I am confused by the statement "no additional risk was added by not replacing the spares immediately but waiting for the "any drive in the system has zero spare coverage" ". Aren't any drives that spare from the depleted pool at increased risk? If another drive fails that spares from that pool it has no spare and is at risk until the EMC CE comes out the next day.

448 Posts

July 8th, 2010 05:00

No they are not at risk.  When there is no permanent spare available for any drive in the array a case is generated for replacement of the failed disk drives.  There can still be spares in the array that can be used as a stadnard hot spare just not a permanent spare.

43 Posts

July 8th, 2010 08:00

Is this true even on the VMax which only does permanent sparing?

448 Posts

July 8th, 2010 10:00

Thank you for reminding about VMAX sparing.  Sometimes I think I deal with too many products.

2.1K Posts

July 8th, 2010 10:00

So, things are much clearer than they were when this thread started, but I too have a difficult time allowing the assesment of "no additional risk" in allowing a zero hot spare available state.

If there is no risk in allowing a "Zero Hot Spare available" state then why would anyone waste money on putting in hot spares in the first place?

I can understand that the difference might be minimal, but it is ABSOLUTELY greater than the risk involved if Hot Spares ARE available. I could never justify spending the money on a highly redundant highly available system like the Symmetrix and then purposely allow a state where there is no immediately available redundancy if there was an option.

5 Practitioner

 • 

274.2K Posts

July 8th, 2010 10:00

Dynamic sparing is no longer an option in VMAX systems. The point of the ability of a Hot Spare (Dynamic spare) invoking is true in relation to DMX-3 and DMX-4 systems.

However the statement related to "no additional risk" was also taking into consideration VMAX systems with only permanent sparing. This was looked at very closly by the same Engineering team who calculated and recommended how many spares are needed for adequate spare coverage. Please keep in mind one of the key factors that came into play while analizing the additional potential risk of implementing Deferred Service is the overall improvement of drive reliability that has been seen in the past several years.

No Events found!

Top