Start a Conversation

Unsolved

This post is more than 5 years old

6421

September 16th, 2014 07:00

SAN file system mounted as read only

Hi

We have a VMAX 20K. And storage was given to a Linux AS 6 server. This was a database server and suddenly the san volume "/db" went to in to read only mode in the middle of the week. The error message fromsyslog shows the following message . Opened the case with EMC. After going through the host grabs and the Symmetrix Array they did not find any issues. Any body can help me to decode these messages.

We need to find the root cause for this.

Error message

Sep 11 16:30:01 ybrdcmo13 kernel: rport-2:0-2: blocked FC remote port time out: removing target and saving binding

Sep 11 16:30:01 ybrdcmo13 kernel: sd 2:0:0:1: rejecting I/O to offline device

Sep 11 16:30:01 ybrdcmo13 kernel: sd 2:0:0:1: [sdg] killing request

Sep 11 16:30:01 ybrdcmo13 kernel: rport-2:0-3: blocked FC remote port time out: removing target and saving binding

Sep 11 16:30:01 ybrdcmo13 kernel: lpfc 0000:11:00.1: 1:(0):0203 Devloss timeout on WWPN 50:00:09:72:08:5c:ed:a4 NPort x1e82c0 Data: x0 x8 x0

Sep 11 16:30:01 ybrdcmo13 kernel: sd 2:0:1:0: rejecting I/O to offline device

Sep 11 16:30:01 ybrdcmo13 kernel: sd 2:0:1:0: rejecting I/O to offline device

Sep 11 16:30:01 ybrdcmo13 kernel: sd 2:0:1:0: rejecting I/O to offline device

Sep 11 16:30:01 ybrdcmo13 kernel: sd 2:0:1:0: rejecting I/O to offline device

Sep 11 16:30:01 ybrdcmo13 kernel: Error:Mpx:Path Bus 2 Tgt 1 Lun 0 to 000192605947 is dead.

Sep 11 16:30:01 ybrdcmo13 kernel: Error:Mpx:Path Bus 2 Tgt 1 Lun 1 to 000192605947 is dead.

Sep 11 16:30:01 ybrdcmo13 kernel: Error:Mpx:Bus 2 to Symmetrix 000192605947 port 8gA is dead.

Sep 11 16:30:01 ybrdcmo13 kernel: sd 2:0:1:1: [sdi] killing request

Sep 11 16:30:01 ybrdcmo13 kernel: rport-2:0-0: blocked FC remote port time out: removing rport

Sep 11 16:30:01 ybrdcmo13 kernel: lpfc 0000:11:00.1: 1:(0):0203 Devloss timeout on WWPN 50:00:09:72:08:5c:ed:9c NPort x1e0080 Data: x0 x8 x0

Thanks,

Ram

1 Rookie

 • 

20.4K Posts

September 16th, 2014 07:00

are there more than one file system from that VMAX on this host ? If yes, did it stay up ?

24 Posts

September 16th, 2014 07:00

That's a good question. Yes it does also mount "/bckp" for the dataserver to dump the databases. Unfortunately going through the log files I believe nothing was a happening around that time. Also I believe we should have tried to touch a file and confirm it  like how we did it on the "/db" folder.

I opened a case with red hat and this is the response I got

As per the primary analysis on the error message, it seems that there is some issue with storage array. When a scsi layer timeout occurs, the SCSI layer must abort the command so that it cannot complete after we've given up waiting.

Can you please check with your storage team and make sure that there are any hardware issue or error logs available.

You may refer following article for details.

lpfc "SCSI layer issued Device Reset" messages in RHEL

https://access.redhat.com/solutions/39590

1 Rookie

 • 

20.4K Posts

September 18th, 2014 05:00

are you using PowerPath or DM-MPIO ? Did any other hosts connected to that VMAX experience any issues around that time ?

24 Posts

September 18th, 2014 05:00

We were using powerpath (EMCpower.LINUX-5.7.4.00.00-003.el6.x86_64) . No only this host which is why its bizzare. Didn ot see any other meessages. Also we connect this host to 4 director ports on the VMAX 20K . Other machines which use the same Dir ports did not see this issues.

1 Rookie

 • 

20.4K Posts

September 18th, 2014 06:00

if you look in "powermt display dev=all" , all devices are multipathed, symopt policy ?

24 Posts

September 18th, 2014 06:00

Can you explain the difference between the two ? In this case I believe its symopt

/sbin/powermt display dev=all

Pseudo name=emcpowera

Symmetrix ID=000192605947

Logical device ID=0733

state=alive; policy=SymmOpt; queued-IOs=0

==============================================================================

--------------- Host ---------------   - Stor -  -- I/O Path --   -- Stats ---

###  HW Path               I/O Paths    Interf.  Mode     State   Q-IOs Errors

==============================================================================

   2 lpfc                   sdi         FA  8gA  active   alive      0      0

   2 lpfc                   sdg         FA 10gA  active   alive      0      0

   1 lpfc                   sdc         FA  9gA  active   alive      0      0

   1 lpfc                   sde         FA  7gA  active   alive      0      0

Pseudo name=emcpowerb

Symmetrix ID=000192605947

Logical device ID=072B

state=alive; policy=SymmOpt; queued-IOs=1

==============================================================================

--------------- Host ---------------   - Stor -  -- I/O Path --   -- Stats ---

###  HW Path               I/O Paths    Interf.  Mode     State   Q-IOs Errors

==============================================================================

   2 lpfc                   sdh         FA  8gA  active   alive      0      0

   2 lpfc                   sdf         FA 10gA  active   alive      1      0

   1 lpfc                   sdd         FA  7gA  active   alive      0      0

   1 lpfc                   sdb         FA  9gA  active   alive      0      0

1 Rookie

 • 

20.4K Posts

September 18th, 2014 06:00

it was a typo on my part, should have said "SymmOpt". Looks good right now. Going back to the logs, specifically this string:

Error:Mpx:Bus 2 to Symmetrix 000192605947 port 8gA is dead

did you see it listed for other FAs as well ?

24 Posts

October 23rd, 2014 06:00

Dynamox,

EMC did do an RCA and they believe this is outside EMC infrastructure.  I believe they did  good job on looking  in to all switch logs, SAN, and system logs and also worked with Red hat vendors. Nothing conclusive. In the mean time I did move the san devices to a different machine just to rule out the possibility that it was a machine issue.

Hopefully it does not happening again

Thanks,

Ram

5 Practitioner

 • 

274.2K Posts

October 26th, 2014 09:00

We had a similar issue in our environment where the file system went to Read Only mode and on further investigation from  Red hat confirms it's a Multi Path Set up issue.

24 Posts

October 26th, 2014 10:00

SriNer,

Can you elaborate on that . What does it mean by  Multi Path issue ?

--Ram

No Events found!

Top