Start a Conversation

Unsolved

This post is more than 5 years old

2423

February 26th, 2008 23:00

AIX Failing disks

These are the output seen on an AIX server :

dm e778d3BA hdisk431 auto 2048 71383168 FAILING
dm e778d3BE hdisk432 auto 2048 71383168 FAILING

From Symm side , i found the disks are fine ( i just did a symdev show to find it to be RW )

Can anybody help me in resolving this asap?

2 Intern

 • 

2.8K Posts

February 26th, 2008 23:00

First of all could you please explain which command did you use to have the above mentioned output ?? And could you please tell us the OS version you are running ??

If you are experiencing a problem and need a fast answer, please remember that the shortest path to a solution is to open a Service Request with our Customer Service/MSS. However we are here to help you if you prefer a longer path ;-)

2 Intern

 • 

2.8K Posts

February 27th, 2008 00:00

Usually a "FAILED" disk means that the underlying physical drive (as seen from the host) is not available.

1) the host have a single HBA or multiple HBA ??
1bis) if you have multiple HBA, the disk is reported as "FAILED" on all the paths ??
2) the same HBA can "see" other devices or all the devices are reported as "failed" ??

-s-

76 Posts

February 27th, 2008 00:00

-Thats the output of the veritas command vxprint -htr
-AIX ver 5.3

76 Posts

February 27th, 2008 01:00

It looks like that a "failing" disk is a disk that
wasn't available at any given time in the past but
now it looks like being available. So Veritas will
put a "post-it" on the volume just to warn you that
the device disappeared and come back again in the
past.


The failing disks WAS indeed available before.I wonder how they are failing now.

You have to

1) verify with vxdisk list that you can still see the
device

It still sees the device and they are failing and one has failed !

hdisk431 auto:aixdisk e778d3BA IGL12M-APP-dg online failing
- e778d3BE IGL12M-APP-dg failed failing was:hdisk432

2) verify with syminq that you can still see the device


Yes, inq still sees the devices:

/dev/vx/rdmp/hdisk431 :EMC :SYMMETRIX :5670 :78003ba000 : 35692800
/dev/vx/rdmp/hdisk432 :EMC :SYMMETRIX :5670 :78003be000 : 35692800




3) use the vxedit command to remove the "post-it"
(vxedit set failing=off )

Not sure if we can do this ..This would prevent it to report the 'failing' of disks !

I think you have also to understand WHY the device
disappeared (and hopefully come back) in the past. Is
it a BCV and you established it ?? Did you have
problems with the HBA/fabric/storage ports ??


The disks have not disappered, the host still sees the disks but they are 'failing' apart from these 2 disks THERE ARE SEVERAL METAS ASSIGNED TO THIS HOST AND THEY ARE ABSOLUTELY FINE.
BCVs are not established
HBA,switch port and director port is absolutely fine.

2 Intern

 • 

2.8K Posts

February 27th, 2008 01:00

It looks like that a "failing" disk is a disk that wasn't available at any given time in the past but now it looks like being available. So Veritas will put a "post-it" on the volume just to warn you that the device disappeared and come back again in the past.

You have to

1) verify with vxdisk list that you can still see the device
2) verify with syminq that you can still see the device
3) use the vxedit command to remove the "post-it" (vxedit set failing=off )

I think you have also to understand WHY the device disappeared (and hopefully come back) in the past. Is it a BCV and you established it ?? Did you have problems with the HBA/fabric/storage ports ??

2 Intern

 • 

2.8K Posts

February 27th, 2008 02:00

We'll be very glad if you update the thread with news from the SR .. But maybe someone (Rob from NL or Mr Dynamox from Atlanta) will shade some light on your issue.

76 Posts

February 27th, 2008 02:00

:( :( :( :( :( :( :( :( :( :( :( :( :(

Will update if they find a solution ( actually they are lazy to bother SRs ) I know EMC Support due to obvious reasons.:)

But if you find some solutions for this ,do update.

2 Intern

 • 

2.8K Posts

February 27th, 2008 02:00

The situation looks very complex .. Unfortunatly I can't help you further ..

I think it's better if you open a SR.

2 Intern

 • 

20.4K Posts

February 27th, 2008 03:00

We were waiting for you !! Did you find a traffic jam
?? Or are you still "working" from your bed ;-)


heheh...that's tomorrow ....unfortunately today i have to come to the office and it's 0c outside plus rain ..perfect day to work from home :D

2 Intern

 • 

2.8K Posts

February 27th, 2008 03:00

We were waiting for you !! Did you find a traffic jam ?? Or are you still "working" from your bed ;-)

2 Intern

 • 

20.4K Posts

February 27th, 2008 03:00

amrith,

would you please post output of lsdev -Cc disk

76 Posts

February 27th, 2008 19:00

would you please post output of lsdev -Cc disk


hdisk431 Available 04-08-02 EMC Symmetrix FCP Raid5
hdisk432 Available 04-08-02 EMC Symmetrix FCP Raid5

76 Posts

February 27th, 2008 22:00

I have a feeling this issue has nothing to do with Symm ....
I have rasied a SR , donno whn those guys will pick my SR.

I feel this is a Veritas issue....
Lets see....Time will tell...

All, you can still help me ! :)

2 Intern

 • 

2.8K Posts

February 27th, 2008 23:00

All, you can still help me ! :)


Yes we'll try to .. but unfortunatly Mr Dynamox is sleeping right now .. So we have to wait for him ..

However if the hdisks are "Available" and not "Defined" I really think it's something sexy with Veritas ;-) .. But as you note, only time will tell :D

385 Posts

February 28th, 2008 04:00

Have you checked errpt to make sure that you aren't seeing any errors from this device?

I've seen issues with the past on AIX where devices with very poor response times would generate "disk failures" which I'm speculating might translate into "failing" devices in Veritas.

PS Why Veritas on top of AIX - that has to be one of the worst combinations I can imagine as far as trouble-shooting problems in UNIX environments :)
No Events found!

Top