Start a Conversation

Unsolved

This post is more than 5 years old

2537

July 13th, 2016 14:00

(Hopefully) Basic EMC Powerpath/Clarion/VNX Question

Hello Community,

I am a lifetime Computer/IT person, but have dealt very little directly with EMC products other than they are there with other's dealing with day to day issues on them.  I need help with what I hope is a basic question.

I have a large set of servers at my current client that are using, in part, various EMC based storage devices.

Recently I am having quite a few issues with storage going unavailable on several servers and, in tracking this I discovered the Powerpath utility is reported things that appear to be issues or errors.  When I report these issues to the only 'Storage' (part time and part time other IT things) person I have to talk to about this he says these are 'Normal'.  I need help to determine if this is true or if there are real issues.

On one particular server I am seeing Degraded disks and Failed paths on a SQL Server that has basically had SQL crashes 3 times in the past 2 weeks.  He keeps saying 'it is not a storage issue'.  All the Data & Logs are located on VNX or Clarion devices.

DBS1_2.JPG.jpgDBS1_3.JPG.jpg

DBS1_4.JPG.jpg

DBS1_5.JPG.jpg

I just want to know if there is anything here worth pursuing?  I am seeing a lot of 'errors' from Powerpath, Clarion and EmcpMpx in the event logs.  These may or may not be 'errors' at all, but that is part of what I am trying to get at.

I need to find a root cause for these failures.  This is a production system with many users.  There are no other noteworthy errors on the server itself.  SQL itself appears healthy and I have run system diags on all other components.

Any help or advice is appreciated.

Will Mc

306 Posts

July 14th, 2016 00:00

Hi Will,

Basically it looks like you are having some connectivity issues between your host and the array(s). Whether it is the cause of the server crashing, it's really hard to tell without checking the cluster/application logs - but it's worth investigating, all right.

The problem here is that any of the components on the way (host, HBA, switch, SP port, array itself) can be failing, so we'll need to check all of them to find the root cause - I am afraid that due to the amount of logs that need to be checked (EMCreports, switch logs, array SP collects), the only proper way of doing it will be through a Service Request - please open one with PowerPath/Windows support and we'll take care of this.

Thank you,

Pawel

27 Posts

August 2nd, 2016 09:00

Hello Will,

Not sure if you still are looking for some assistance. Regarding the dead paths or degraded devices, looking at the screenshots, it appears that your configuration is as follows:

4 paths to CLARiiON

8 paths to VNX

The reason why Disk 3 and 5 show degraded is because there is a single path that is dead from . When a device is in degraded mode it simply means that at least 1 path is dead thus not optimal. PP automatically uses the remaining paths for IO until the path some back alive.

Now, focusing on each device, Disk 3 is coming from CLARiiON and looks to have 3 paths via port4 (c4t0d3, c4t1d3, and c4t2d3), but one is dead (c4t0d3). Comparing with the other CLARiiON LUNs, none of them show any path using target 0 (t0), it is either target 1 or 2. For Disk 5, it is coming from VNX and looks to have 5 paths via port4 (c4t0d1, c4t2d1, c4t3d1, c4t4d1, and c4t5d1) with one path (c4t2d1) dead. Comparing with the other VNX LUNs, none of them show any paths via target 2 (t2), they are only using target 0, 3, 4, and 5. That tells me those are ghost/invalid paths. Please check for the presence of POWERMT.CUSTOM (xml and/or lck extension) either inside EMC\PowerPath\ (5.7.x and up) or EMC\PowerCommon\ (5.5.x and below). If found, rename/delete this file and reboot the host. It is not recommended to use this file unless there are specific configurations the host needs/uses.

Regarding the crashes, it would be difficult to truly diagnose what might be happening, but the fact that there are a lot of path dead events may suggest a connectivity problem.

Hopefully this helps, if so, please make sure to mark it as answered.

Thanks,

Andres

No Events found!

Top