brad12341

207 Posts

2335

November 8th, 2010 15:00

Alerting for SRDF/A drops

We recently implemented SRDF/A with our 2 DMX4s between CA and TX. This first implementation is for our 2 z/OS mainframes one in TX and one in CA. During normal processing we have no problem keeping 30 second cycles. During heavy batch processing our cycles can elongate to 60 minutes or more. When we get too far behind SRDF uses too much cache and sometimes we drop. I am trying to use ECC to monitor SRDF. There is an alarm set up to send an alert if we get a certain wp_limit drop. We got a wp_limit drop during month end but did not get the aleret. It turns out there is a second condition that can trigger a slightly different wp_limit drop and it has its own alert that needs to be set. I'm concerned that some day we will drop with some new condition that is not caught by our currenn ECC alerts.

Is there a better way of monitoring SRDF/A, maybe outside of ECC. Maybe the mainframe could check something to make sure it is still up and alert if it is down. I am an Open Systems (Solutions Enabler etc) but I have become familiar with some of the mainframe SQ Query commands etc.

Responses(15)

dynamox

2 Intern

•

20.4K Posts

0

November 8th, 2010 20:00

take a look at symrecover command

aTECJhQ8T711981

13 Posts

1

November 9th, 2010 03:00

Hi,

There was a new Service Alert introduced to alert you that SRDF/A has dropped in the Symmetrix.

Mainframe console will log either a 14CA or E4CA service alert when SRDF/A has dropped, see primus solution emc219215.

SQ SRDFA command could then be the used to check the status of your SRDF/A Groups.

brad12341

207 Posts

0

November 9th, 2010 09:00

Joe, our systems programmer has been able to see these errors in the MVS logs so we can create an alert off them. We use a scheduling product called ESP and we my use it to kick off MVS jobs to restart SRDF/A. We might do something like this -

SC VOL,LCL(5000,03),ADCOPY-DISK,ALL

SC VOL,LCL(5000,03),RDF-RSUM,ALL

SC VOL,RMT(5000,03),RNG-REFRESH,ALL

SC VOL,RMT(5000,03)RNG-RSUM,ALL

SC VOL,LCL(5000,03),ACT

It seems like whenever SRDF/A drops the the 03 Mainframe group - the volumes go to TNR-SY (Sync) but if I do an symrdf query from open systems Solutions Enabler - it says it is in async mode. I don't know which system is giving us the true answer. We always run the SC VOL,LCL(5000,03),ADCOPY-DISK,ALL just to make sure we don't accidently restart in Sync mode as that would kill our performance.

What do you think about the 5 SC commands above for restarting. I know I can usually restart an open systems group simply by doing "syrmrdf resume". In some cases I may need to do a "symrdf est" (incremental establish). I want to test this out on the mainframe group at some point. If it works it would allow our Open Systems people to restart the mainframe SRDF/A group.

brad12341

207 Posts

0

November 9th, 2010 12:00

I've noticed after a drop while running the RDF_RSUM we always seem to get "Special Processing is Required". At that point we run the RNG-REFRESH and RNG-RSUM. Hopefully its ok to run both these jobs even if we don't get the "Special Processing is Required". As we want to set up 1 routine to always use after a drop. Sound ok?

Booyah2

184 Posts

1

November 9th, 2010 12:00

Hello Brad,

Those 5 SC commands are appropriate. You may find that the RNG-REFRESH and RNG-RSUM are only needed if the RDF-RSUM fails with "devices are inelible for the action".

But always first check your operational mode and issue the ADCOPY-DISK,ALL... as you noted "sync mode" would not be good for production performance.

Booyah2

184 Posts

0

November 9th, 2010 13:00

Yep, no issue with that...

Booyah2

184 Posts

0

November 10th, 2010 06:00

Brad,

Found some more info that may interest you.

There is a PARM called “on drop” that can be set that will cause SRDF/a to drop to AC Disk rather than Sync.

Booyah2

184 Posts

0

November 10th, 2010 06:00

Oh, I noticed one more thing I wanted to mention...

The syntax for your activate command is incorrect.

SC VOL,LCL(5000,03),ADCOPY-DISK,ALL

SC VOL,LCL(5000,03),RDF-RSUM,ALL

SC VOL,RMT(5000,03),RNG-REFRESH,ALL

SC VOL,RMT(5000,03)RNG-RSUM,ALL

SC VOL,LCL(5000,03),ACT

it should be:

SC SRDFA,LCL(5000,03),ACT

Booyah2

184 Posts

0

November 10th, 2010 06:00

Good morning Brad,

After sleeping on your questions, I think it would be best just to remove the RDF-RSUM all together since you are using the RNG-REFRESH and RNG-RSUM.

Doing both is redundant work and unnecessary. This would speed things up for you and also be a cleaner process.

Say by chance the RDF-RSUM did not fail, then the RNG- commands would not be necessary and would fail since you were already resumed. So, by removing the RDF-RSUM you should not have any commands to fail and this would be cleaner and less confusion.

SANjoker

14 Posts

0

November 10th, 2010 07:00

Booyah - Could you point us to some documentation for the "on drop" PARM? Dropping down to Adaptive Copy might be highly desireable. TIA.

Booyah2

184 Posts

0

November 10th, 2010 09:00

SANjoker,

Let me see if I can find that documentation. I will post as soon as I do.

Booyah2

184 Posts

0

November 10th, 2010 09:00

You will be fine running these commands if network is still down, but as you stated, if the Network is down the last three commands will Fail.

The ADCOPY command should complete successfully as it pertains to the R1 side only.

SC VOL,LCL(5000,03),ADCOPY-DISK,ALL

SC VOL,RMT(5000,03),RNG-REFRESH,ALL

SC VOL,RMT(5000,03)RNG-RSUM,ALL

SC SRDFA,LCL(5000,03),ACT

brad12341

207 Posts

0

November 10th, 2010 09:00

Any documentation regarding "on drop" would be great. Also I was thinking of a scenario.

Below are the commands I would like to run everytime SRDF/A drops (thanks booya). Sometimes we lose our DS3 network circuits (for SRDF/A) for many seconds but sometimes for a longer period of time. If we lost the network and and it was still down - would there be any harm in running all the commands below? If a trained SAN person is available he will verify that the network is back up before running commands. But if an operator (with minimal training) is involved we want to make things as easy as possible. Could he just run the commands or would that do damage. Obviously SRDF/A would not restart if the network was still down - just want to confirm the issue regarding any damage by running the commands with network still down. Hopefully the commands would just fail with no damage. The operator would wait and then run them again once network is up.

SC VOL,LCL(5000,03),ADCOPY-DISK,ALL

SC VOL,RMT(5000,03),RNG-REFRESH,ALL

SC VOL,RMT(5000,03)RNG-RSUM,ALL

SC SRDFA,LCL(5000,03),ACT

Booyah2

184 Posts

0

November 10th, 2010 10:00

Page 64 of the Product Guide.

There is also a knowledgeLink article here:

http://knowledgebase.emc.com/emcice/login.do?sType=ax1990&sName=1204&id=emc800126

If you find any of this helpful, please be so kind as to award helpful points. This is how our helpfulness is graded.

Booyah2

184 Posts

0

November 10th, 2010 10:00

Ok, this may help... and maybe not.

I found the PARM in the EMC ResourcePak Base Product Guide.

Here's the link:

http://powerlink.emc.com/km/live1/en_US/Offering_Technical/Technical_Documentation/300-009-599.pdf?mtcs=ZXZlbnRUeXBlPUttQ2xpY2tDb250ZW50RXZlbnQsZG9jdW1lbnRJZD0wOTAxNDA2NjgwNDRjNzRhLG5hdmVOb2RlPVNvZndhcmVEb3dubG9hZHMtMQ__

View All

No Events found!