This post is more than 5 years old

155 Posts

3436

August 4th, 2008 23:00

fatal alert

i have got the following error msg on 30th jul. just wanted to ensure no issues with that director now. how to ensure that?
symevent -sid xxx list -fatal
Symmetrix ID: xxx
Detection time Dir Src Category Severity Error Num
------------------------ ------ ---- ------------ ------------ ----------
Wed Jul 30 12:52:29 2008 DF-6C Symm Director Fatal 0x0040
A Symmetrix Director is not responding

2 Intern

 • 

292 Posts

August 6th, 2008 06:00

The DF (disk director) is a redundant director. Meaning that if one goes DD, it's dual initiator will take over and control all of the it's drives as well as it's own. You will still have access to all of your disks, but you lose redundancy.

2 Intern

 • 

292 Posts

August 5th, 2008 06:00

You should open up a service request to have this Symm looked at. The error indicates that director 6c (should be a back end disk director going by the DF) had a fatal error and is no longer responding. An error like this would have prompted a dial home but if your not sure have the lab look into it would be a good idea.

6 Operator

 • 

2.8K Posts

August 5th, 2008 07:00

If i close my eyes I can even see a red display showing a steady "DD" ;-)

2 Intern

 • 

292 Posts

August 5th, 2008 08:00

Doesn't necessarily have to be at DD. It could be non-communicating and may not have dropped to DD.

155 Posts

August 6th, 2008 00:00

Mike, i wanted to know how to ensure no issues with that DF and that error is no longer valid. it was reported a week ago.

6 Operator

 • 

2.8K Posts

August 6th, 2008 00:00

Symevent may help you .. In case processor was dead (DD) you can check with symcfg if all DF processors are Online. However I think it's better to open a SR and have someone dial the box and check health :D

6 Operator

 • 

2.8K Posts

August 22nd, 2008 12:00

A board in DD doesn't need to be replaced. PSE lab will diag why processors dropped DD and eventually reboot them. It happens also with drives. Sometime PSE spins down your "supposed-to-be-broken" drive and spins it up again .. If it spins up fine, it was a false error. And trust me it's easier to have a broken disk ;-)

However before replacing any single hardware part the CE have to carefully read all SR notes from PSE .. And sometime the CE can't even touch hardware if a PSE isn't logged in the box, issuing commands with a very special account (usually your CE will log on symwin using "CE" account while the PSE uses a different and powerfull user) and checking command output before giving directions to your local friendly CE.

Just as an example .. when you have 2 broken drives the CE can't change them. He needs a PSE to check box status and tell him what drive to replace first and when to replace second drive...

And before running any script CE have to run beloved KTS (key to success) script that does a boring and long check of many different aspects. And trust me, KTS will report a not working DF.

In a word .. there are a lot of people checking each other to avoid a single mistake to bring down your business.

6 Operator

 • 

5.7K Posts

August 22nd, 2008 12:00

in which case you want to have that DF replaced asap any way. This can be done online since the other DF is taking over all disk related IO anyway.

1 Question comesto my mind: if another DF goes offline which causes a DF to take over on this particular card (the one with the first failing DF) the system is still running fine, but replacing any of the 2 failing DF boards causes som disks to loose connectivity. How is this handled ?

1 Rookie

 • 

119 Posts

August 22nd, 2008 13:00

Just to add a little on Stefano's comment about many people checking each other, the scripted procedures also perform various checks to verify the condition of the system and if it is safe to proceed. If any issues are found, a warning will be reported and require PSE assistance.

The director replacement script will check to see if any directors other than the one specified for replacement are in a failed state. If any are found, the script will post a warning. Many scripts (if not all) check for newly failed directors throughout the script.

The drive replacement script has a step(s) that check that all of the volumes on the drive to be replaced have ready and valid mirror/RAID members.

In response to RRR's question about having to replace a DF dual initiator pair, each with a failed processor: This would be considered a rare and very serious condition. High level PSE and / or development engineering assistance would be required to determine the best course of action. This would be determined on a case-by-case basis depending on the cause of the failures, so I cannot say exactly how the problem would be corrected. All measures would be taken to recover the situation without disruption.

Mike

6 Operator

 • 

2.8K Posts

August 25th, 2008 13:00

Can we help further with this thread ??

6 Operator

 • 

2.8K Posts

August 31st, 2008 12:00

Not necessarily .. If both redundant processors are down, you lost only half of every mirrored volume .. And even less when you have RAID .. It will obviously will impact you but will give only poor performances. Loosing both processors will simply bring down all the drives "owned" by the processors. Mirroring and RAID will still protect your beloved data and grant you access to your porn collection ;-)

6 Operator

 • 

5.7K Posts

August 31st, 2008 12:00

I guess that would be a severity 1 SR......

6 Operator

 • 

5.7K Posts

August 31st, 2008 13:00

hmmmmmm
I guess you didn't understand what I was saying:
what if 2 DA's are dead each holding half of several disks. If you now replace 1 DA, some disks don't have a working DA anymore.... if you replcae the other DA, the other disks don't have a working DA..... talking about a dilemma.....

6 Operator

 • 

2.8K Posts

August 31st, 2008 14:00

In your example, you have to replace 2 DA. They both share same drives (via LCC or PBC) so even if both fails at the same time, your data is still protected since your mirrored volumes uses 2 different DA pairs.

I think you are talking about replacing both DA in a single DA pair. It looks scary, however code enforces rules to avoid having problems with this very specific situation (and even a lot of other "corner case") :D

In case you are talking about 2 different DA in 2 different DA pairs, it's even easier .. When you unplug a DA, the other DA (in DA pair) will jump in and give access to ALL devices.

To make a complex thing easy, each DA pair grants access to low level disks, while logical volume protection protects your data against physical drive unavailability. And a drive may be unavailable 'couse it is failed or 'couse both DA in DA-pair are faulty ;-)

6 Operator

 • 

2.8K Posts

September 10th, 2008 13:00

It's scary as long as you don't dig deep into this topic .. :D
As soon as you realize how it works, it's relaxing ;-)
No Events found!

Top