Start a Conversation

Unsolved

This post is more than 5 years old

3765

August 4th, 2016 14:00

Replication failure troubleshooting

We have a PS4100 running v6.0.11 that replicates three volumes over a VPN WAN tunnel to a PS4000 running the same version of firmware. Normally, replication is rock solid, but all replication jobs stalled permanently a couple of days ago. Console error msg: "replTunnel.cc:1642:ERROR:7.4.78; Partner APPLE: iSCSI: login timed out. Make sure the partner IP address is correct and reachable."

Sad to say, we do not have an up-to-date service contract on either; so I'm seeking help triaging this issue. The VPN tunnel has been restarted and is up; I've successfully pinged 64-byte packets across it. I know that the first test is to successfully pint every destination interface address from every source interface address. Pls remind me of the CLI syntax to do so.

My only clue is that the running replication job seems to have stalled about the time that I did a controller restart on the source SAN to swap out the cache battery. Is it possible that the jumbo frames setting has to be set on both controllers?

Anyway...guidance would be appreciated; I really have to get this replication rolling again! Thanks!

5 Practitioner

 • 

274.2K Posts

August 4th, 2016 14:00

Hello, 

 Jumbo frames are set at the group level,  on failover the new CM inherits all the settings, IP, netmask, etc.. 

 You can make sure Jumbo frames for replication are disabled with: 

 1.)  Pause all replication. 

 2.)  At the Group CLI.    GrpName>su repl-use-jumbo no 

  It will just return to the GrpName prompt. 

 3.)  Un pause replication 

  To test IP connectivity at the GrpName> prompt run: 

  ping "-I       

 Note: This is a capital I 

   Ex.  GrpName>ping "-I 10.0.0.11  172.10.16.21" 

   You need to make sure each Ethernet interface on EVERY member array can reach ALL the IPs at the DR site and vice versa. 

  I also tend to disable jumbo frames on the dr site too. 

  Regards, 

Don 

5 Practitioner

 • 

274.2K Posts

August 4th, 2016 15:00

Hello, 

 Sorry, no.  I can't give out those internal support commands.  I would suggest you do a restart and fail back to the other controller instead. 

 Regards,

Don 

August 4th, 2016 15:00

Thanks. I did the restart. Same result.

August 4th, 2016 15:00

I worked through a very similar issue last year with EQ support. One of the things the tech did was to restart the MgtExec service. Would you tell me the su command syntax to do that? What I'm finding online isn't working. Thx. J

August 4th, 2016 15:00

I have verified that I can ping across the VPN from the group address and all "up" Ethernet interfaces to the group address and all "up" Ethernet interfaces BOTH ways. Also can verify that the repl-use-jumbos value for both groups is set to NO. Tried a test replication after verifying that; still fails; on the PROD group, the replication job shows as "in progress" in the GUI but with 0 bytes transmitted, and on the DR group, nothing changes in the GUI.

5 Practitioner

 • 

274.2K Posts

August 4th, 2016 15:00

Is there a firewall in place?   Replication uses port 3260 (iSCSI).  I would make sure that port is open. 

 Also you can verify it, by creating a small test volume at one end and connecting to it from the other side.  If that works then format it.

 Don 

August 7th, 2016 11:00

Since I'm running a VPN tunnel over the WAN, it's kind of an "all or nothing" situation, wouldn't you say? Either the WAN firewall is going to allow the VPN tunnel, or it's not; it won't be able to parse the tunnel by protocol. The tunnel is up and I can communicate over it with 64-byte ping packets. I need to try packets at the MTU, 1500-bytes. What is the CLI syntax to manipulate the packet size? Thanks!

5 Practitioner

 • 

274.2K Posts

August 7th, 2016 21:00

Hello, 

 Re: Firewall.  For replication port 3260 and ICMP are required.  Which a firewall can block but still allow VPN tunneling. 

RE: Jumbo Frames for replication.   At the CLI run:   GrpName>support repl-use-jumbo no 

 Regards,

Don 

5 Practitioner

 • 

274.2K Posts

August 8th, 2016 06:00

Hello, 

 You are very welcome. 

ping  -s  

 GrpName>ping "-s 1408 192.168.1.1" 

Regards,

Don

August 8th, 2016 06:00

Thank you. I'll get with my firewall team to verify that 3260 & ICMP traffic are passing.

Sorry, I wasn't clear in my packet size question. What I meant to ask is "What is the CLI syntax to manipulate the PING packet size? I need to test at the MTU of 1500 bytes, not the default ping packet size of 64 bytes.

I appreciate all of your help!

August 8th, 2016 11:00

Even with an MTU of 1500 bytes, trial and error testing shows that the command ping "s- 1472 192.168.x.x" works, returning a console response: 1480 bytes from 192.168.x.x: ....." anything larger fails. Is that significant to the console error msg of "iSCSI login failure"?

5 Practitioner

 • 

274.2K Posts

August 8th, 2016 11:00

Hello, 

 Did you run the test using the capital I first?   Making sure that every port can reach every other port?  That's critical since replication will use any port as the source and destination. 

 Also make sure the gateway doesn't fragment packets.  

Re: iSCSI login.  That will likely be a host connection, not replication.  The details in the error will show what host is trying to connect. 

 Don 

August 8th, 2016 13:00

I had previously run the comprehensive set of ping commands, from all interfaces to all interfaces, successfully.

I have received confirmation from our NOC that port 3260 and ICMP traffic flow across the WAN unimpeded.

The console output on the iSCSI login failure is: "4271:2570:NILS:MgmtExec: 8-Aug-2016 12:22:38. 182571:relTunnel.cc:1642:ERROR:7.4.78:Partner APPLE: iSCSI: login timed out. Make sure the partner IP address is correct and reachable."

I learned during a previous troubleshooting exercise for a similar issue that replication includes an iSCSI login.

Thanks.

5 Practitioner

 • 

274.2K Posts

August 10th, 2016 04:00

Hello, 

Well that's about as far as I'll be able to take it.   You might want to see if you can purchase a new service contract and then open a support case. 

 Regards, 

Don 

No Events found!

Top