junior5

24 Posts

6514

September 16th, 2014 07:00

SAN file system mounted as read only

Hi

We have a VMAX 20K. And storage was given to a Linux AS 6 server. This was a database server and suddenly the san volume "/db" went to in to read only mode in the middle of the week. The error message fromsyslog shows the following message . Opened the case with EMC. After going through the host grabs and the Symmetrix Array they did not find any issues. Any body can help me to decode these messages.

We need to find the root cause for this.

Error message

Sep 11 16:30:01 ybrdcmo13 kernel: rport-2:0-2: blocked FC remote port time out: removing target and saving binding

Sep 11 16:30:01 ybrdcmo13 kernel: sd 2:0:0:1: rejecting I/O to offline device

Sep 11 16:30:01 ybrdcmo13 kernel: sd 2:0:0:1: [sdg] killing request

Sep 11 16:30:01 ybrdcmo13 kernel: rport-2:0-3: blocked FC remote port time out: removing target and saving binding

Sep 11 16:30:01 ybrdcmo13 kernel: lpfc 0000:11:00.1: 1:(0):0203 Devloss timeout on WWPN 50:00:09:72:08:5c:ed:a4 NPort x1e82c0 Data: x0 x8 x0

Sep 11 16:30:01 ybrdcmo13 kernel: sd 2:0:1:0: rejecting I/O to offline device

Sep 11 16:30:01 ybrdcmo13 kernel: Error:Mpx:Path Bus 2 Tgt 1 Lun 0 to 000192605947 is dead.

Sep 11 16:30:01 ybrdcmo13 kernel: Error:Mpx:Path Bus 2 Tgt 1 Lun 1 to 000192605947 is dead.

Sep 11 16:30:01 ybrdcmo13 kernel: Error:Mpx:Bus 2 to Symmetrix 000192605947 port 8gA is dead.

Sep 11 16:30:01 ybrdcmo13 kernel: sd 2:0:1:1: [sdi] killing request

Sep 11 16:30:01 ybrdcmo13 kernel: rport-2:0-0: blocked FC remote port time out: removing rport

Sep 11 16:30:01 ybrdcmo13 kernel: lpfc 0000:11:00.1: 1:(0):0203 Devloss timeout on WWPN 50:00:09:72:08:5c:ed:9c NPort x1e0080 Data: x0 x8 x0

Thanks,

Ram

Responses(10)

dynamox

2 Intern

•

20.4K Posts

0

September 16th, 2014 07:00

are there more than one file system from that VMAX on this host ? If yes, did it stay up ?

junior5

24 Posts

0

September 16th, 2014 07:00

That's a good question. Yes it does also mount "/bckp" for the dataserver to dump the databases. Unfortunately going through the log files I believe nothing was a happening around that time. Also I believe we should have tried to touch a file and confirm it like how we did it on the "/db" folder.

I opened a case with red hat and this is the response I got

As per the primary analysis on the error message, it seems that there is some issue with storage array. When a scsi layer timeout occurs, the SCSI layer must abort the command so that it cannot complete after we've given up waiting.

Can you please check with your storage team and make sure that there are any hardware issue or error logs available.

You may refer following article for details.

lpfc "SCSI layer issued Device Reset" messages in RHEL

https://access.redhat.com/solutions/39590

dynamox

2 Intern

•

20.4K Posts

0

September 18th, 2014 05:00

are you using PowerPath or DM-MPIO ? Did any other hosts connected to that VMAX experience any issues around that time ?

junior5

24 Posts

0

September 18th, 2014 05:00

We were using powerpath (EMCpower.LINUX-5.7.4.00.00-003.el6.x86_64) . No only this host which is why its bizzare. Didn ot see any other meessages. Also we connect this host to 4 director ports on the VMAX 20K . Other machines which use the same Dir ports did not see this issues.

dynamox

2 Intern

•

20.4K Posts

0

September 18th, 2014 06:00

if you look in "powermt display dev=all" , all devices are multipathed, symopt policy ?

junior5

24 Posts

0

September 18th, 2014 06:00

Can you explain the difference between the two ? In this case I believe its symopt

/sbin/powermt display dev=all

Pseudo name=emcpowera

Symmetrix ID=000192605947

Logical device ID=0733

state=alive; policy=SymmOpt; queued-IOs=0

==============================================================================

--------------- Host --------------- - Stor - -- I/O Path -- -- Stats ---

### HW Path I/O Paths Interf. Mode State Q-IOs Errors

==============================================================================

2 lpfc sdi FA 8gA active alive 0 0

2 lpfc sdg FA 10gA active alive 0 0

1 lpfc sdc FA 9gA active alive 0 0

1 lpfc sde FA 7gA active alive 0 0

Pseudo name=emcpowerb

Symmetrix ID=000192605947

Logical device ID=072B

state=alive; policy=SymmOpt; queued-IOs=1

==============================================================================

--------------- Host --------------- - Stor - -- I/O Path -- -- Stats ---

### HW Path I/O Paths Interf. Mode State Q-IOs Errors

==============================================================================

2 lpfc sdh FA 8gA active alive 0 0

2 lpfc sdf FA 10gA active alive 1 0

1 lpfc sdd FA 7gA active alive 0 0

1 lpfc sdb FA 9gA active alive 0 0

dynamox

2 Intern

•

20.4K Posts

0

September 18th, 2014 06:00

it was a typo on my part, should have said "SymmOpt". Looks good right now. Going back to the logs, specifically this string:

Error:Mpx:Bus 2 to Symmetrix 000192605947 port 8gA is dead

did you see it listed for other FAs as well ?

junior5

24 Posts

0

October 23rd, 2014 06:00

Dynamox,

EMC did do an RCA and they believe this is outside EMC infrastructure. I believe they did good job on looking in to all switch logs, SAN, and system logs and also worked with Red hat vendors. Nothing conclusive. In the mean time I did move the san devices to a different machine just to rule out the possibility that it was a machine issue.

Hopefully it does not happening again

Thanks,

Ram

A

Anonymous

5 Practitioner

•

274.2K Posts

0

October 26th, 2014 09:00

We had a similar issue in our environment where the file system went to Read Only mode and on further investigation from Red hat confirms it's a Multi Path Set up issue.

junior5

24 Posts

0

October 26th, 2014 10:00

SriNer,

Can you elaborate on that . What does it mean by Multi Path issue ?

--Ram

View All

No Events found!