100 Posts

January 15th, 2016 15:00

I would post this into the VNX Support Forum located at:VNX

Cheers~

53 Posts

January 17th, 2016 18:00

Hi Pablo,

It is unfortunate to hear about this scenario that occurred with your VNX.  As a field tech with EMC for 10 years, and a specialist in VNX products, I usually see site environmental factors that cause this.  I'm not sure if you have a temperate sensor in front of the VNX DAE to help monitor?  While it seems like it would take a lot of I/O, it could also be possible that certain disks are running hot because they are overloaded.  Has anyone performed a performance analysis on your array to determine balanced workload is occurring?  Or perhaps higher I/O LUNs need to be moved to SSDs, etc.  You can also request a failure analysis on components by VNX engineering to be performed if needed.  I'm not aware of any code bugs for thermal reporting, though we do recommend to upgrade to the latest target codes at least twice per year.  The current target code levels for VNX5200 are File 8.1.8.121 and Block 05.33.008.5.119.  Also, are you set up to receive email alerts from the VNX system?  If so, you will also receive a weekly "array is alive" email notification to ensure alerts can be sent out.  It could have been a VNX hardware issue that was unable to notify EMC or yourself, so proactive measure could not occur.  When systems are deployed in ESRS, it will monitor the storage processors with a heartbeat signal (like a ping test) to ensure remote connectivity in. However the individual device alerts going out to EMC are set up in each product.  For VNX you can set up alerts to go through your SMTP server or through the ESRS server. So if a product is deployed on ESRS, the device alerts still need to be defined separately.  Just wanted to clarify as that is often misunderstood. 

Regarding the account administrative issues, I know that can be a problem for customers when it occurs.  So we do have local Account Service Engineers (ASE) / Account Service Representatives (ASR) / Customer Service Engineers (CSE) assigned to your account that would assist with this process.  This would also be something that your EMC District Service Manager can help escalate to resolution.

Ultimately, we would recommend to work with your local sales and service account teams to assist in this process of updating records and ensuring device alerting is set up. Usually this process is started by creating a service request to engage local support.  Though if there are issues, we can also provide some assistance here.

Ryan Brancel

EMC Account Service Engineer

3 Posts

January 18th, 2016 07:00

Ryan, thanks for your response. Yes, we have a temperature sensor (a Netbotz unit) sitting 1m ahead of the rack front; it sends alerts whenever we have an overtemperature event; we also have an analog thermostat wired to the EPO input of each of our UPSs; so if the ambient temperature rises above a threshold the UPS get turned off. We also have dual on-line ups and dual precision ac cooling, all backed by dual diesel generators . The rack is an APC server rack and the room is not often accesed phýsically as our servers have remote management cards. The high temp was measured without load (it showed in one system disk and two data disks) while the adjacent drives were at 20°C; this was measured with an EMC representative on site who also measured that the ambient temp was ok.

What baffles me (apart from the temp measures) and find most inexcusable is that the VNX shutdown itself when it could perfectly work (we had a spare disk which could be used to recover the failing drives) and only one system drive was showing problems (this drive is mirrored). Also, the other sp was ok, and instead of trespassing the LUN's, it elects to turn off (taking the whole infraestructure down).

The administrative problems we had with EMC deserve a whole new chapter; there was not a single step in our relationship that was trouble free. I lost count of how many times I reported problems, been told that they were fixed only to find out they weren't. I've been contacted regarding equipment that don't belong to us, our contact data was handled to another unrelated firm; the sites were mangled, and the icing of the cake is that the ESRS we were assigned was from another firm (so our vnx was never proactively monitored). Two weeks ago an EMC representative showed up to configure ESRS but I was not told about the device alerts neither I receive the weekly "array is alive" (the test emails worked ok at the time);

Unluckily the local sales and service reps move at a speed we find is not suitable to our needs; it's has been nearly a month since we had this problem (and opened a SR) and two weeks since the parts were sent to failure analysis, and we don't even have an ETA on when we will have answers from EMC; meanwhile we are forced to use alternative equipment wich is no really suitable to our needs.

Pablo

3 Posts

January 26th, 2016 07:00

Hi Javier, we hope we have a timely answer to all the concerns we voiced at the meeting, and a fullfillment on all the commitments you made. As of now we are still waiting for a timetable for the failure analisys.

Yours

Pablo

No Events found!

Top