halamca
2 Iron

有关Hdisk链路闪断的问题

我的主机是IBM P550,存储是EMC CX4-120,通过两台交换机,四条链路连接。最近在某日的20分钟内,出现了链路的报错。从主机端(errpt)可以看到:

B6267342   0817091212 P H hdisk65        DISK OPERATION ERROR

B6267342   0817091112 P H hdisk65        DISK OPERATION ERROR

B6267342   0817091112 P H hdisk65        DISK OPERATION ERROR

B6267342   0817091112 P H hdisk65        DISK OPERATION ERROR

B6267342   0817091112 P H hdisk65        DISK OPERATION ERROR

B6267342   0817091112 P H hdisk65        DISK OPERATION ERROR

B6267342   0817091012 P H hdisk65        DISK OPERATION ERROR

B6267342   0817090912 P H hdisk12        DISK OPERATION ERROR

B6267342   0817090912 P H hdisk12        DISK OPERATION ERROR

B6267342   0817090912 P H hdisk14        DISK OPERATION ERROR

B6267342   0817090812 P H hdisk14        DISK OPERATION ERROR

B6267342   0817090812 P H hdisk62        DISK OPERATION ERROR

B6267342   0817090812 P H hdisk62        DISK OPERATION ERROR

B6267342   0817090812 P H hdisk65        DISK OPERATION ERROR

B6267342   0817090712 P H hdisk65        DISK OPERATION ERROR

B6267342   0817090712 P H hdisk65        DISK OPERATION ERROR

B6267342   0817090712 P H hdisk65        DISK OPERATION ERROR

B6267342   0817090712 P H hdisk65        DISK OPERATION ERROR

B6267342   0817090712 P H hdisk65        DISK OPERATION ERROR

B6267342   0817090712 P H hdisk62        DISK OPERATION ERROR

B6267342   0817090612 P H hdisk62        DISK OPERATION ERROR

B6267342   0817090612 P H hdisk62        DISK OPERATION ERROR

B6267342   0817090612 P H hdisk62        DISK OPERATION ERROR

B6267342   0817090612 P H hdisk62        DISK OPERATION ERROR

B6267342   0817090612 P H hdisk62        DISK OPERATION ERROR

C6E26F3B   0817090612 I H hdisk60        BACK-UP PATH STATUS CHANGE

B6267342   0817090612 P H hdisk60        DISK OPERATION ERROR

C6E26F3B   0817090612 I H hdisk58        BACK-UP PATH STATUS CHANGE

C6E26F3B   0817090612 I H hdisk51        BACK-UP PATH STATUS CHANGE

B6267342   0817090612 P H hdisk60        DISK OPERATION ERROR

C6E26F3B   0817090612 I H hdisk65        BACK-UP PATH STATUS CHANGE

C6E26F3B   0817090612 I H hdisk54        BACK-UP PATH STATUS CHANGE

C6E26F3B   0817090612 I H hdisk50        BACK-UP PATH STATUS CHANGE

C6E26F3B   0817090612 I H hdisk57        BACK-UP PATH STATUS CHANGE

C6E26F3B   0817090612 I H hdisk61        BACK-UP PATH STATUS CHANGE

C6E26F3B   0817090612 I H hdisk52        BACK-UP PATH STATUS CHANGE

C6E26F3B   0817090612 I H hdisk64        BACK-UP PATH STATUS CHANGE

B6267342   0817090612 P H hdisk60        DISK OPERATION ERROR

B6267342   0817090512 P H hdisk60        DISK OPERATION ERROR

C6E26F3B   0817090512 I H hdisk59        BACK-UP PATH STATUS CHANGE

C6E26F3B   0817090512 I H hdisk62        BACK-UP PATH STATUS CHANGE

C6E26F3B   0817090512 I H hdisk63        BACK-UP PATH STATUS CHANGE

C6E26F3B   0817090512 I H hdisk55        BACK-UP PATH STATUS CHANGE

C6E26F3B   0817090512 I H hdisk56        BACK-UP PATH STATUS CHANGE

B6267342   0817090512 P H hdisk56        DISK OPERATION ERROR

B6267342   0817090512 P H hdisk60        DISK OPERATION ERROR

B6267342   0817090512 P H hdisk60        DISK OPERATION ERROR

B6267342   0817090512 P H hdisk56        DISK OPERATION ERROR

C6E26F3B   0817090512 I H hdisk53        BACK-UP PATH STATUS CHANGE

516A2BC4   0817090512 P H hdisk65        CONNECTION FAILURE

516A2BC4   0817090512 P H hdisk54        CONNECTION FAILURE

516A2BC4   0817090512 P H hdisk50        CONNECTION FAILURE

516A2BC4   0817090512 P H hdisk57        CONNECTION FAILURE

516A2BC4   0817090512 P H hdisk61        CONNECTION FAILURE

516A2BC4   0817090512 P H hdisk52        CONNECTION FAILURE

516A2BC4   0817090512 P H hdisk64        CONNECTION FAILURE

516A2BC4   0817090512 P H hdisk60        CONNECTION FAILURE

516A2BC4   0817090512 P H hdisk59        CONNECTION FAILURE

516A2BC4   0817090512 P H hdisk62        CONNECTION FAILURE

516A2BC4   0817090512 P H hdisk63        CONNECTION FAILURE

516A2BC4   0817090512 P H hdisk55        CONNECTION FAILURE

516A2BC4   0817090512 P H hdisk56        CONNECTION FAILURE

516A2BC4   0817090512 P H hdisk53        CONNECTION FAILURE

516A2BC4   0817090512 P H hdisk51        CONNECTION FAILURE

3767AAFF   0817090512 I H hdisk58        BACK-UP PATH INOPERATIVE

516A2BC4   0817090512 P H hdisk58        CONNECTION FAILURE

B6267342   0817090512 P H hdisk58        DISK OPERATION ERROR

B6267342   0817090512 P H hdisk58        DISK OPERATION ERROR

B6267342   0817090512 P H hdisk58        DISK OPERATION ERROR

B6267342   0817090512 P H hdisk58        DISK OPERATION ERROR

B6267342   0817090512 P H hdisk58        DISK OPERATION ERROR

B6267342   0817090512 P H hdisk58        DISK OPERATION ERROR

B6267342   0817090412 P H hdisk8         DISK OPERATION ERROR

B6267342   0817090412 P H hdisk56        DISK OPERATION ERROR

B6267342   0817090412 P H hdisk8         DISK OPERATION ERROR

B6267342   0817090412 P H hdisk56        DISK OPERATION ERROR

B6267342   0817090412 P H hdisk56        DISK OPERATION ERROR

B6267342   0817090412 P H hdisk56        DISK OPERATION ERROR

B6267342   0817090412 P H hdisk56        DISK OPERATION ERROR

B6267342   0817090412 P H hdisk56        DISK OPERATION ERROR

C6E26F3B   0817090412 I H hdisk54        BACK-UP PATH STATUS CHANGE

C6E26F3B   0817090412 I H hdisk50        BACK-UP PATH STATUS CHANGE

B6267342   0817090412 P H hdisk10        DISK OPERATION ERROR

B6267342   0817090412 P H hdisk10        DISK OPERATION ERROR

B6267342   0817090412 P H hdisk54        DISK OPERATION ERROR

C6E26F3B   0817090412 I H hdisk57        BACK-UP PATH STATUS CHANGE

C6E26F3B   0817090412 I H hdisk61        BACK-UP PATH STATUS CHANGE

B6267342   0817090412 P H hdisk54        DISK OPERATION ERROR

B6267342   0817090412 P H hdisk54        DISK OPERATION ERROR

B6267342   0817090312 P H hdisk54        DISK OPERATION ERROR

C6E26F3B   0817090312 I H hdisk52        BACK-UP PATH STATUS CHANGE

C6E26F3B   0817090312 I H hdisk64        BACK-UP PATH STATUS CHANGE

C6E26F3B   0817090312 I H hdisk60        BACK-UP PATH STATUS CHANGE

C6E26F3B   0817090312 I H hdisk59        BACK-UP PATH STATUS CHANGE

C6E26F3B   0817090312 I H hdisk62        BACK-UP PATH STATUS CHANGE

C6E26F3B   0817090312 I H hdisk63        BACK-UP PATH STATUS CHANGE

C6E26F3B   0817090312 I H hdisk55        BACK-UP PATH STATUS CHANGE

C6E26F3B   0817090312 I H hdisk56        BACK-UP PATH STATUS CHANGE

C6E26F3B   0817090312 I H hdisk53        BACK-UP PATH STATUS CHANGE

C6E26F3B   0817090312 I H hdisk58        BACK-UP PATH STATUS CHANGE

B6267342   0817090312 P H hdisk17        DISK OPERATION ERROR

B6267342   0817090312 P H hdisk54        DISK OPERATION ERROR

B6267342   0817090312 P H hdisk54        DISK OPERATION ERROR

B6267342   0817090312 P H hdisk17        DISK OPERATION ERROR

C6E26F3B   0817090312 I H hdisk51        BACK-UP PATH STATUS CHANGE

C6E26F3B   0817090312 I H hdisk65        BACK-UP PATH STATUS CHANGE

516A2BC4   0817090312 P H hdisk54        CONNECTION FAILURE

516A2BC4   0817090312 P H hdisk52        CONNECTION FAILURE

516A2BC4   0817090312 P H hdisk64        CONNECTION FAILURE

516A2BC4   0817090312 P H hdisk60        CONNECTION FAILURE

516A2BC4   0817090312 P H hdisk59        CONNECTION FAILURE

516A2BC4   0817090312 P H hdisk62        CONNECTION FAILURE

516A2BC4   0817090312 P H hdisk63        CONNECTION FAILURE

516A2BC4   0817090312 P H hdisk55        CONNECTION FAILURE

516A2BC4   0817090312 P H hdisk56        CONNECTION FAILURE

516A2BC4   0817090312 P H hdisk53        CONNECTION FAILURE

516A2BC4   0817090312 P H hdisk58        CONNECTION FAILURE

516A2BC4   0817090312 P H hdisk51        CONNECTION FAILURE

516A2BC4   0817090312 P H hdisk65        CONNECTION FAILURE

516A2BC4   0817090312 P H hdisk57        CONNECTION FAILURE

516A2BC4   0817090312 P H hdisk61        CONNECTION FAILURE

3767AAFF   0817090312 I H hdisk50        BACK-UP PATH INOPERATIVE

516A2BC4   0817090312 P H hdisk50        CONNECTION FAILURE

B6267342   0817090312 P H hdisk50        DISK OPERATION ERROR

B6267342   0817090312 P H hdisk50        DISK OPERATION ERROR

B6267342   0817090312 P H hdisk50        DISK OPERATION ERROR

B6267342   0817090312 P H hdisk50        DISK OPERATION ERROR

B6267342   0817090212 P H hdisk50        DISK OPERATION ERROR

B6267342   0817090212 P H hdisk50        DISK OPERATION ERROR

B6267342   0817090212 P H hdisk17        DISK OPERATION ERROR

B6267342   0817090212 P H hdisk17        DISK OPERATION ERROR

B6267342   0817090212 P H hdisk17        DISK OPERATION ERROR

B6267342   0817090212 P H hdisk17        DISK OPERATION ERROR

B6267342   0817090212 P H hdisk17        DISK OPERATION ERROR

B6267342   0817090212 P H hdisk17        DISK OPERATION ERROR

B6267342   0817090112 P H hdisk2         DISK OPERATION ERROR

B6267342   0817090112 P H hdisk14        DISK OPERATION ERROR

B6267342   0817090112 P H hdisk2         DISK OPERATION ERROR

B6267342   0817090112 P H hdisk14        DISK OPERATION ERROR

B6267342   0817090112 P H hdisk14        DISK OPERATION ERROR

B6267342   0817090112 P H hdisk14        DISK OPERATION ERROR

B6267342   0817090112 P H hdisk6         DISK OPERATION ERROR

B6267342   0817090112 P H hdisk14        DISK OPERATION ERROR

B6267342   0817090112 P H hdisk14        DISK OPERATION ERROR

B6267342   0817090112 P H hdisk6         DISK OPERATION ERROR

B6267342   0817090112 P H hdisk12        DISK OPERATION ERROR

B6267342   0817090012 P H hdisk12        DISK OPERATION ERROR

B6267342   0817090012 P H hdisk12        DISK OPERATION ERROR

B6267342   0817090012 P H hdisk12        DISK OPERATION ERROR

B6267342   0817090012 P H hdisk12        DISK OPERATION ERROR

B6267342   0817090012 P H hdisk12        DISK OPERATION ERROR

B6267342   0817090012 P H hdisk10        DISK OPERATION ERROR

B6267342   0817090012 P H hdisk10        DISK OPERATION ERROR

B6267342   0817090012 P H hdisk10        DISK OPERATION ERROR

B6267342   0817085912 P H hdisk10        DISK OPERATION ERROR

B6267342   0817085912 P H hdisk10        DISK OPERATION ERROR

B6267342   0817085912 P H hdisk10        DISK OPERATION ERROR

B6267342   0817085912 P H hdisk8         DISK OPERATION ERROR

B6267342   0817085912 P H hdisk8         DISK OPERATION ERROR

B6267342   0817085912 P H hdisk8         DISK OPERATION ERROR

B6267342   0817085912 P H hdisk8         DISK OPERATION ERROR

B6267342   0817085912 P H hdisk8         DISK OPERATION ERROR

B6267342   0817085912 P H hdisk8         DISK OPERATION ERROR

B6267342   0817085912 P H hdisk6         DISK OPERATION ERROR

B6267342   0817085812 P H hdisk6         DISK OPERATION ERROR

B6267342   0817085812 P H hdisk6         DISK OPERATION ERROR

B6267342   0817085812 P H hdisk6         DISK OPERATION ERROR

B6267342   0817085812 P H hdisk6         DISK OPERATION ERROR

B6267342   0817085812 P H hdisk6         DISK OPERATION ERROR

B6267342   0817085812 P H hdisk2         DISK OPERATION ERROR

B6267342   0817085812 P H hdisk2         DISK OPERATION ERROR

B6267342   0817085812 P H hdisk2         DISK OPERATION ERROR

B6267342   0817085712 P H hdisk2         DISK OPERATION ERROR

B6267342   0817085712 P H hdisk2         DISK OPERATION ERROR

B6267342   0817085712 P H hdisk2         DISK OPERATION ERROR

详细报错信息(只各发一例):

LABEL:          SC_DISK_ERR2

IDENTIFIER:     B6267342

Date/Time:       Fri Aug 17 09:07:42 BEIST 2012

Sequence Number: 318239

Machine Id:      00C8CD024C00

Node Id:         aaaabz2

Class:           H

Type:            PERM

WPAR:            Global

Resource Name:   hdisk65        

Resource Class:  disk

Resource Type:   CLAR_FC_raid5

Location:        U78A0.001.DNWGRTY-P1-C3-T1-W500601683CE02F63-LF000000000000

VPD:            

        Manufacturer................DGC    

        Machine Type and Model......RAID 5         

        ROS Level and ID............0428

        Serial Number...............CKM000xxxxxxxx

        Subsystem Vendor/Device ID..CX4-120       

        Device Specific.(PQ)........00

        Device Specific.(VS)........0F0000353BCL

        Device Specific.(UI)........6006016024742400F23B5D791E2FDE11

        FRU Label...................000F

        Device Specific.(Z0)........10

        Device Specific.(Z1)........10

Description

DISK OPERATION ERROR

Probable Causes

DASD DEVICE

Failure Causes

DISK DRIVE

DISK DRIVE ELECTRONICS

        Recommended Actions

        PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data

PATH ID

           0

SENSE DATA

0600 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0102 0000 7000 0200

0000 000A 0000 0000 0403 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0008 026D 000B 8EC0 0000 0001 0000 0000 0000 0000 0000 0000 0000

0000 0012 001A

第二个报错信息:

LABEL:          EMCP_PATH_ALIVE

IDENTIFIER:     C6E26F3B

Date/Time:       Fri Aug 17 09:03:43 BEIST 2012

Sequence Number: 318160

Machine Id:      00C8CD024C00

Node Id:         aaaabz2

Class:           H

Type:            INFO

WPAR:            Global

Resource Name:   hdisk59        

Resource Class:  disk

Resource Type:   CLAR_FC_raid5

Location:        U78A0.001.DNWGRTY-P1-C3-T1-W500601683CE02F63-L9000000000000

VPD:            

        Manufacturer................DGC    

        Machine Type and Model......RAID 5         

        ROS Level and ID............0428

        Serial Number...............CKM000xxxxxxxx

        Subsystem Vendor/Device ID..CX4-120       

        Device Specific.(PQ)........00

        Device Specific.(VS)........0D0000276BCL

        Device Specific.(UI)........6006016024742400B87BD5921D2FDE11

        FRU Label...................000D

        Device Specific.(Z0)........10

        Device Specific.(Z1)........10

Description

BACK-UP PATH STATUS CHANGE

Probable Causes

DISK

SCSI ADAPTER

SCSI CABLE

Failure Causes

DISK

SCSI ADAPTER

CABLE LOOSE OR DEFECTIVE

        Recommended Actions

        PERFORM PROBLEM DETERMINATION ON SCSI TARGET DEVICE

        PERFORM PROBLEM DETERMINATION ON HOST SCSI ADAPTER

                REPLACE SCSI CABLE

Detail Data

RESOURCE NAME

第三个报错信息:

LABEL:          EMCP_PATH_DEAD

IDENTIFIER:     516A2BC4

Date/Time:       Fri Aug 17 09:05:38 BEIST 2012

Sequence Number: 318190

Machine Id:      00C8CD024C00

Node Id:         aaaabz2

Class:           H

Type:            PERM

WPAR:            Global

Resource Name:   hdisk51        

Resource Class:  disk

Resource Type:   CLAR_FC_raid5

Location:        U78A0.001.DNWGRTY-P1-C3-T1-W500601683CE02F63-L1000000000000

VPD:            

        Manufacturer................DGC    

        Machine Type and Model......RAID 5         

        ROS Level and ID............0428

        Serial Number...............CKM000xxxxxxxx

        Subsystem Vendor/Device ID..CX4-120       

        Device Specific.(PQ)........00

        Device Specific.(VS)........050000B7B7CL

        Device Specific.(UI)........60060160FD532400E92B2AAA4A1FDE11

        FRU Label...................0005

        Device Specific.(Z0)........10

        Device Specific.(Z1)........10

Description

CONNECTION FAILURE

Probable Causes

DISK

SCSI ADAPTER

SCSI CABLE

Failure Causes

DISK

SCSI ADAPTER

CABLE LOOSE OR DEFECTIVE

        Recommended Actions

        PERFORM PROBLEM DETERMINATION ON SCSI TARGET DEVICE

        PERFORM PROBLEM DETERMINATION ON HOST SCSI ADAPTER

                REPLACE SCSI CABLE

Detail Data

RESOURCE NAME

第四个报错信息:

LABEL:          SCAN_ERROR_CHRP

IDENTIFIER:     BFE4C025

Date/Time:       Wed Jul 11 09:13:15 BEIST 2012

Sequence Number: 318075

Machine Id:      00C8CD024C00

Node Id:         aaaabz2

Class:           H

Type:            PERM

WPAR:            Global

Resource Name:   sysplanar0     

Resource Class:  planar

Resource Type:   sysplanar_rspc

Location:       

Description

UNDETERMINED ERROR

Failure Causes

UNDETERMINED

        Recommended Actions

        RUN SYSTEM DIAGNOSTICS.

Detail Data

PROBLEM DATA

但是查看当前的路径,都正常的:

Pseudo name=hdiskpower12

CLARiiON ID=CKM000xxxxxxxx [aaaabz]

Logical device ID=6006016024742400665DBD9D1D2FDE11 [LUN 14]

state=alive; policy=CLAROpt; priority=0; queued-IOs=0

Owner: default=SP B, current=SP B       Array failover mode: 1

==============================================================================

---------------- Host ---------------   - Stor -   -- I/O Path -  -- Stats ---

###  HW Path                I/O Paths    Interf.   Mode    State  Q-IOs Errors

==============================================================================

   0 fscsi0                    hdisk16   SP B1     active  alive      0      0

   0 fscsi0                    hdisk32   SP A1     active  alive      0      1

   1 fscsi1                    hdisk48   SP A0     active  alive      0      1

   1 fscsi1                    hdisk64   SP B0     active  alive      0      2

Pseudo name=hdiskpower14

CLARiiON ID=CKM000xxxxxxxx [aaaabz]

Logical device ID=6006016024742400B87BD5921D2FDE11 [LUN 13]

state=alive; policy=CLAROpt; priority=0; queued-IOs=0

Owner: default=SP A, current=SP A       Array failover mode: 1

==============================================================================

---------------- Host ---------------   - Stor -   -- I/O Path -  -- Stats ---

###  HW Path                I/O Paths    Interf.   Mode    State  Q-IOs Errors

==============================================================================

   0 fscsi0                    hdisk11   SP B1     active  alive      0      0

   0 fscsi0                    hdisk27   SP A1     active  alive      0      1

   1 fscsi1                    hdisk43   SP A0     active  alive      0      1

   1 fscsi1                    hdisk59   SP B0     active  alive      0      2

Pseudo name=hdiskpower15

CLARiiON ID=CKM000xxxxxxxx [aaaabz]

Logical device ID=6006016024742400F23B5D791E2FDE11 [LUN 15]

state=alive; policy=CLAROpt; priority=0; queued-IOs=0

Owner: default=SP A, current=SP A       Array failover mode: 1

==============================================================================

---------------- Host ---------------   - Stor -   -- I/O Path -  -- Stats ---

###  HW Path                I/O Paths    Interf.   Mode    State  Q-IOs Errors

==============================================================================

   0 fscsi0                    hdisk17   SP B1     active  alive      0      0

   0 fscsi0                    hdisk33   SP A1     active  alive      0      1

   1 fscsi1                    hdisk49   SP A0     active  alive      0      1

   1 fscsi1                    hdisk65   SP B0     active  alive      0      2

Pseudo name=hdiskpower9

CLARiiON ID=CKM000xxxxxxxx [aaaabz]

Logical device ID=60060160FD5324003862E1854A1FDE11 [LUN 1]

state=alive; policy=CLAROpt; priority=0; queued-IOs=0

Owner: default=SP B, current=SP B       Array failover mode: 1

==============================================================================

---------------- Host ---------------   - Stor -   -- I/O Path -  -- Stats ---

###  HW Path                I/O Paths    Interf.   Mode    State  Q-IOs Errors

==============================================================================

   0 fscsi0                    hdisk13   SP B1     active  alive      0      0

   0 fscsi0                    hdisk29   SP A1     active  alive      0      1

   1 fscsi1                    hdisk45   SP A0     active  alive      0      1

   1 fscsi1                    hdisk61   SP B0     active  alive      0      2

Pseudo name=hdiskpower10

CLARiiON ID=CKM000xxxxxxxx [aaaabz]

Logical device ID=60060160FD5324003962E1854A1FDE11 [LUN 2]

state=alive; policy=CLAROpt; priority=0; queued-IOs=0

Owner: default=SP A, current=SP A       Array failover mode: 1

==============================================================================

---------------- Host ---------------   - Stor -   -- I/O Path -  -- Stats ---

###  HW Path                I/O Paths    Interf.   Mode    State  Q-IOs Errors

==============================================================================

   0 fscsi0                    hdisk14   SP B1     active  alive      0      0

   0 fscsi0                    hdisk30   SP A1     active  alive      0      1

   1 fscsi1                    hdisk46   SP A0     active  alive      0      1

   1 fscsi1                    hdisk62   SP B0     active  alive      0      2

Pseudo name=hdiskpower8

CLARiiON ID=CKM000xxxxxxxx [aaaabz]

Logical device ID=60060160FD5324003CB443784A1FDE11 [LUN 0]

state=alive; policy=CLAROpt; priority=0; queued-IOs=0

Owner: default=SP A, current=SP A       Array failover mode: 1

==============================================================================

---------------- Host ---------------   - Stor -   -- I/O Path -  -- Stats ---

###  HW Path                I/O Paths    Interf.   Mode    State  Q-IOs Errors

==============================================================================

   0 fscsi0                    hdisk12   SP B1     active  alive      0      0

   0 fscsi0                    hdisk28   SP A1     active  alive      0      1

   1 fscsi1                    hdisk44   SP A0     active  alive      0      1

   1 fscsi1                    hdisk60   SP B0     active  alive      0      2

Pseudo name=hdiskpower11

CLARiiON ID=CKM000xxxxxxxx [aaaabz]

Logical device ID=60060160FD5324007E06FB8B4A1FDE11 [LUN 3]

state=alive; policy=CLAROpt; priority=0; queued-IOs=0

Owner: default=SP B, current=SP B       Array failover mode: 1

==============================================================================

---------------- Host ---------------   - Stor -   -- I/O Path -  -- Stats ---

###  HW Path                I/O Paths    Interf.   Mode    State  Q-IOs Errors

==============================================================================

   0 fscsi0                    hdisk15   SP B1     active  alive      0      0

   0 fscsi0                    hdisk31   SP A1     active  alive      0      1

   1 fscsi1                    hdisk47   SP A0     active  alive      0      1

   1 fscsi1                    hdisk63   SP B0     active  alive      0      2

Pseudo name=hdiskpower13

CLARiiON ID=CKM000xxxxxxxx [aaaabz]

Logical device ID=60060160FD532400E82B2AAA4A1FDE11 [LUN 4]

state=alive; policy=CLAROpt; priority=0; queued-IOs=0

Owner: default=SP A, current=SP A       Array failover mode: 1

==============================================================================

---------------- Host ---------------   - Stor -   -- I/O Path -  -- Stats ---

###  HW Path                I/O Paths    Interf.   Mode    State  Q-IOs Errors

==============================================================================

   0 fscsi0                    hdisk18   SP A1     active  alive      0      1

   0 fscsi0                    hdisk2    SP B1     active  alive      0      0

   1 fscsi1                    hdisk34   SP A0     active  alive      0      1

   1 fscsi1                    hdisk50   SP B0     active  alive      0      2

Pseudo name=hdiskpower0

CLARiiON ID=CKM000xxxxxxxx [aaaabz]

Logical device ID=60060160FD532400E92B2AAA4A1FDE11 [LUN 5]

state=alive; policy=CLAROpt; priority=0; queued-IOs=0

Owner: default=SP B, current=SP B       Array failover mode: 1

==============================================================================

---------------- Host ---------------   - Stor -   -- I/O Path -  -- Stats ---

###  HW Path                I/O Paths    Interf.   Mode    State  Q-IOs Errors

==============================================================================

   0 fscsi0                    hdisk19   SP A1     active  alive      0      1

   1 fscsi1                    hdisk35   SP A0     active  alive      0      1

   0 fscsi0                    hdisk3    SP B1     active  alive      0      0

   1 fscsi1                    hdisk51   SP B0     active  alive      0      2

Pseudo name=hdiskpower1

CLARiiON ID=CKM000xxxxxxxx [aaaabz]

Logical device ID=60060160FD532400EA2B2AAA4A1FDE11 [LUN 6]

state=alive; policy=CLAROpt; priority=0; queued-IOs=0

Owner: default=SP A, current=SP A       Array failover mode: 1

==============================================================================

---------------- Host ---------------   - Stor -   -- I/O Path -  -- Stats ---

###  HW Path                I/O Paths    Interf.   Mode    State  Q-IOs Errors

==============================================================================

   0 fscsi0                    hdisk20   SP A1     active  alive      0      1

   1 fscsi1                    hdisk36   SP A0     active  alive      0      1

   0 fscsi0                    hdisk4    SP B1     active  alive      0      0

   1 fscsi1                    hdisk52   SP B0     active  alive      0      2

Pseudo name=hdiskpower2

CLARiiON ID=CKM000xxxxxxxx [aaaabz]

Logical device ID=60060160FD532400EB2B2AAA4A1FDE11 [LUN 7]

state=alive; policy=CLAROpt; priority=0; queued-IOs=0

Owner: default=SP B, current=SP B       Array failover mode: 1

==============================================================================

---------------- Host ---------------   - Stor -   -- I/O Path -  -- Stats ---

###  HW Path                I/O Paths    Interf.   Mode    State  Q-IOs Errors

==============================================================================

   0 fscsi0                    hdisk21   SP A1     active  alive      0      1

   1 fscsi1                    hdisk37   SP A0     active  alive      0      1

   1 fscsi1                    hdisk53   SP B0     active  alive      0      2

   0 fscsi0                    hdisk5    SP B1     active  alive      0      0

Pseudo name=hdiskpower3

CLARiiON ID=CKM000xxxxxxxx [aaaabz]

Logical device ID=60060160FD532400EC2B2AAA4A1FDE11 [LUN 8]

state=alive; policy=CLAROpt; priority=0; queued-IOs=0

Owner: default=SP A, current=SP A       Array failover mode: 1

==============================================================================

---------------- Host ---------------   - Stor -   -- I/O Path -  -- Stats ---

###  HW Path                I/O Paths    Interf.   Mode    State  Q-IOs Errors

==============================================================================

   0 fscsi0                    hdisk22   SP A1     active  alive      0      1

   1 fscsi1                    hdisk38   SP A0     active  alive      0      1

   1 fscsi1                    hdisk54   SP B0     active  alive      0      2

   0 fscsi0                    hdisk6    SP B1     active  alive      0      0

Pseudo name=hdiskpower4

CLARiiON ID=CKM000xxxxxxxx [aaaabz]

Logical device ID=60060160FD532400ED2B2AAA4A1FDE11 [LUN 9]

state=alive; policy=CLAROpt; priority=0; queued-IOs=0

Owner: default=SP B, current=SP B       Array failover mode: 1

==============================================================================

---------------- Host ---------------   - Stor -   -- I/O Path -  -- Stats ---

###  HW Path                I/O Paths    Interf.   Mode    State  Q-IOs Errors

==============================================================================

   0 fscsi0                    hdisk23   SP A1     active  alive      0      1

   1 fscsi1                    hdisk39   SP A0     active  alive      0      1

   1 fscsi1                    hdisk55   SP B0     active  alive      0      2

   0 fscsi0                    hdisk7    SP B1     active  alive      0      0

Pseudo name=hdiskpower5

CLARiiON ID=CKM000xxxxxxxx [aaaabz]

Logical device ID=60060160FD532400EE2B2AAA4A1FDE11 [LUN 10]

state=alive; policy=CLAROpt; priority=0; queued-IOs=0

Owner: default=SP A, current=SP A       Array failover mode: 1

==============================================================================

---------------- Host ---------------   - Stor -   -- I/O Path -  -- Stats ---

###  HW Path                I/O Paths    Interf.   Mode    State  Q-IOs Errors

==============================================================================

   0 fscsi0                    hdisk24   SP A1     active  alive      0      1

   1 fscsi1                    hdisk40   SP A0     active  alive      0      1

   1 fscsi1                    hdisk56   SP B0     active  alive      0      2

   0 fscsi0                    hdisk8    SP B1     active  alive      0      0

Pseudo name=hdiskpower6

CLARiiON ID=CKM000xxxxxxxx [aaaabz]

Logical device ID=60060160FD532400EF2B2AAA4A1FDE11 [LUN 11]

state=alive; policy=CLAROpt; priority=0; queued-IOs=0

Owner: default=SP B, current=SP B       Array failover mode: 1

==============================================================================

---------------- Host ---------------   - Stor -   -- I/O Path -  -- Stats ---

###  HW Path                I/O Paths    Interf.   Mode    State  Q-IOs Errors

==============================================================================

   0 fscsi0                    hdisk25   SP A1     active  alive      0      1

   1 fscsi1                    hdisk41   SP A0     active  alive      0      1

   1 fscsi1                    hdisk57   SP B0     active  alive      0      2

   0 fscsi0                    hdisk9    SP B1     active  alive      0      0

Pseudo name=hdiskpower7

CLARiiON ID=CKM000xxxxxxxx [aaaabz]

Logical device ID=60060160FD532400F02B2AAA4A1FDE11 [LUN 12]

state=alive; policy=CLAROpt; priority=0; queued-IOs=0

Owner: default=SP A, current=SP A       Array failover mode: 1

==============================================================================

---------------- Host ---------------   - Stor -   -- I/O Path -  -- Stats ---

###  HW Path                I/O Paths    Interf.   Mode    State  Q-IOs Errors

==============================================================================

   0 fscsi0                    hdisk10   SP B1     active  alive      0      0

   0 fscsi0                    hdisk26   SP A1     active  alive      0      1

   1 fscsi1                    hdisk42   SP A0     active  alive      0      1

   1 fscsi1                    hdisk58   SP B0     active  alive      0      2

# powermt display

CLARiiON logical device count=16

==============================================================================

----- Host Bus Adapters ---------  ------ I/O Paths -----  ------ Stats ------

###  HW Path                       Summary   Total   Dead  IO/Sec Q-IOs Errors

==============================================================================

   0 fscsi0                        optimal      32      0       -     0     16

   1 fscsi1                        optimal      32      0       -     0     48

因为报错只发生在当天的20分钟内,从目前来看,链路状态都正常,应用也没受到影响,因此我初步怀疑只是链路突然闪断了一下,可能问题在交换机上。

不知道论坛上有没有其他人遇到过这种问题?

0 项奖励
9 回复数
Jun_Tan
3 Zinc

Re: 有关Hdisk链路闪断的问题

楼主:

现在这个问题对你影响怎么样呢?如果影响严重,建议仔细排查下;如果问题不严重,备忘记录下就行了。

链路闪断比较常见,可能有很多原因,不一定是交换机问题。另外问题没有重现,是硬件问题可能性不太大。建议可以从大范围排查下问题,比如那个时间点的机房环境。。。如果还不确定,建议持续观察主机一段时间。

0 项奖励
liulei_it
5 Tungsten

Re: 有关Hdisk链路闪断的问题

存储换成 symmetrix 系列吧

刚刚知道CX系列控制器是 A/P 模式使用的哈,

0 项奖励
halamca
2 Iron

Re: 有关Hdisk链路闪断的问题

杀鸡焉用宰牛刀!

0 项奖励
chrsi_wang
3 Silver

Re: 有关Hdisk链路闪断的问题

看看CX上的SP Event日志里面有没有什么报错信息出来。

0 项奖励
Alex_Ye
3 Argentum

Re: 有关Hdisk链路闪断的问题

楼主贴出来的日志很详细。

1. SCAN_ERROR_CHRP   这个错误可能是说P服务器的Sysplanar有问题,建议请硬件厂商协助做个完整的硬件诊断。

2.下面的两个错误(注意错误的label):

516A2BC4   0817090512 P H hdisk65    CONNECTION FAILURE
C6E26F3B   0817090612 I H hdisk64    BACK-UP PATH STATUS CHANGE

其实都是Powerpath软件报的日志,分别表示出现了Dead Path,以及Path状态恢复。这里并不一定是真的出现了链路的错误。"Connection Failure" 是有点迷惑性的。

3. SC_DISK_ERR2 错误的Sense Data的意思是"Not Ready",这是因为这是Clariion的被动路径。trespass 发生的时候有可能会看到这个问题,但是AIX上看到这个错误,再结合powermt display dev=all的输出,我们就可能发现一个配置错误:failover mode=1.

这个配置是在Clariion(以及VNX)的管理界面Unisphere/Navisphere中针对主机HBA (Initiator)的一个配置,定义通道切换的模式。对于AIX主机安装Powerpath软件的环境,这个参数要配置为3,否则trespass的时间会延长,甚至失败,以及在errpt中会出现上面的错误。

从上面的日志来看,有可能是链路错误,也有可能是sysplanar的问题导致了一路的问题,因此发生了trespass。由于failover mode的配置错误导致了 "DISK OPERATION ERROR" (但是Poweprath报的日志无法避免,因为确实发生了Path Dead然后恢复了).

errpt的信息可能已经被后面的日志覆盖掉,似乎看不出问题的最初原因了。

关于Failover Mode配置取值以及修改过程,EMC有一系列的知识库文章,都很容易查到的。修改之后需要重启主机来生效。

halamca
2 Iron

Re: 有关Hdisk链路闪断的问题

因为设备不在本地,因此尚未检查存储的情况。我刚又仔细看了一遍报错信息,发现几个现象:

1、当天所有的报错,均出在同一个控制器上!都是SP B0和SP B1。

2、SP B0这个端口对应的所有hdisk均报错。

3、位于SP B1端口的hidisk报错中,其LUN默认的控制器均为SP A,但并非所有默认SP A的ower都有报错,比如就没有hdisk11的报错。

您提到了SC_DISK_ERR2 错误,在errpt里有详细的最初报错,最开始的那次报错为:

LABEL:          SC_DISK_ERR2
IDENTIFIER:     B6267342

Date/Time:       Fri Aug 17 08:57:40 BEIST 2012
Sequence Number: 318079
Machine Id:      00C8CD024C00
Node Id:         aaaabz2
Class:           H
Type:            PERM
WPAR:            Global
Resource Name:   hdisk2         
Resource Class:  disk
Resource Type:   CLAR_FC_raid5
Location:        U78A0.001.DNWGRTY-P1-C1-T1-W500601693CE02F63-L0
VPD:            
        Manufacturer................DGC    
        Machine Type and Model......RAID 5         
        ROS Level and ID............0428
        Serial Number...............CKM000xxxxxxxx
        Subsystem Vendor/Device ID..CX4-120       
        Device Specific.(PQ)........00
        Device Specific.(VS)........040000B7B3CL
        Device Specific.(UI)........60060160FD532400E82B2AAA4A1FDE11
        FRU Label...................0004
        Device Specific.(Z0)........10
        Device Specific.(Z1)........10

Description
DISK OPERATION ERROR

Probable Causes
DASD DEVICE

Failure Causes
DISK DRIVE
DISK DRIVE ELECTRONICS

        Recommended Actions
        PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
PATH ID
           0
SENSE DATA
0600 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0102 0000 7000 0200
0000 000A 0000 0000 0403 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0007 B84B 000E 4740 0000 0001 0000 0000 0000 0000 0000 0000 0000
0000 0012 001A

除了时间不同外,基本与我主贴上的一样。

目前就是不知道到底是哪儿出了问题。

0 项奖励
halamca
2 Iron

Re: 有关Hdisk链路闪断的问题

由于设备不在本地,因此还没有实地前去查看存储。

1、我注意到一个现象,所有报错均处在同一个控制器上,即控制器B上的SP B1和SP B0。

2、SP B0端口上所有的hdisk都报错,而SP B1端口只有部分hdisk报错。

3、SP B1端口上报错的hdisk,其所在LUN的默认控制器均为SP A。但并未所有默认为SP A控制器的、SP B1端口上的hdisk都报错,如hdisk11未报错。

至于您提到的SC_DISK_ERR2 错误,一开始的报错与我发上来的报错,除了时间较早外,几无差异。

0 项奖励
Alex_Ye
3 Argentum

Re: 有关Hdisk链路闪断的问题

你是不是把前面提到的各种错误都累计,然后得出上面的现象呢?或者直接观察powermt display dev=all的输出?

之前提到的错误性质各不相同,有些可能表明是链路问题,有些是需要被激活但是由于错误的设置报的错误。但是总体而言,我觉得fscsi1这块HBA卡到交换机的链路有问题的可能性比较大。建议对于物理链路做一个检查(包括交换机端口,SFP,HBA卡以及光纤),并且及早修改failover mode值。

halamca
2 Iron

Re: 有关Hdisk链路闪断的问题

我这是统计发生DISK OPERATION ERROR错误的情况,得出这一现象。

不过,目前所有链路状态都恢复正常,要检查,估计得出的结论会是所有硬件都正常。真不知该从何检查起。

不过我看上面的情况,fcs1这块HBA也与控制器A有连接,却无报错,那么我以为,这个卡起码是没问题的。

0 项奖励