Question about PS6000 replication

Question

Howdy. I've got a pair of PS6000's that I'm trying to make replicate a volume between units A and B. Set it all up by the numbers and the manual, things were looking OK. I made a volume, presented it to my users who started pushing data into it. I made a second volume that was to be presented, but the users weren't ready for it yet. So I set up the replications. Now my first volume is 2 TB, and is on unit A. It is online and contains data. On unit B, I've delegated 3TB of space to ensure that I've got enough to catch this volume when it's replicated. My second volume is 500 GB. It contains no data, but is online. No connections to it yet. It is also configured to replicate to unit B. When I tell the solution to replicate, my second volume replicates just fine - it's all zeroes and this happens very quickly. I understand that. However, my first volume indicates from unit A that it is in-progress, while on unit B it is showing as incomplete. Further inquiry shows in the events on unit B an error of 'Partner XXXXXXX: No connection could be established. Verify that the partner IP address is correct.' I know it's correct - as my second volume replicates through. I do have a mixed configuration on these units due to some security constraints in that only eth0 on both units can 'see' each other. I realize the impact that this will have on performance, and we're not worried about speed of the replication, just ensuring that the replication happens. A look at the network ports on both chassis show them both to have a standing load on them, moreso than average. My question is this: I *think* it's replicating, just slowly. Is there a way that I can get better resolution on this process and verify that the volume is indeed still replicating? And can anyone shed any light on the the error in the logs on unit B? If my partner IP is incorrect, why then does it work? (PS, FWIW, the unit A is NAT'd) Or is this a relic and the network path does work fine, but because of the NAT the error is thrown? Thanks for your time and attention, sorry about the wall of text. V/r,  - Abe Lister

Joe S586 · Accepted Answer

awliste,

Replication is not supported with NAT (may work sometimes, and other times will not work). What happens during an iSCSI login between the two partners is that initial connection goes to the group address, which returns a "target has temporarily moved" login failure and then redirects to a physical IP address of one of the eth interfaces on the receiving group.

The problem with NAT is that you can re-map the group address, but the login redirection is embedded in the iSCSI protocol. Unless your NAT knows how to snoop inside the iSCSI protocol and rewrite those addresses the group will be redirected to an IP address it cannot see, and replication will never complete.

In order for replication to function, your arrays have to be normally routed to each other, so they can each see not only the group IP but all of the eht IP addresses directly, without any translations.

Regards,

Joe

Dev Mgr · Answer

On the B-side, verify that you see the replica of the 500GB volume and test it by cloning it to a volume. If that works, I'd say it was indeed successful.

When partnering 2 units for replication, be sure to also match the case of the partner name as this will prevent replication from working, but it only shows up (with a non-descriptive error message) after you try to actually replicate a volume.

awliste · Answer

The replica converted to a volume with no problems. I wish I had data in the 500GB to REALLY verify it, but everything went smoothly. I *think* it's working - it's just interesting in the WAN application and the NATting. I wish I had a way to trick that instead of having to accept the errors in my logs. We'll see - if my numbers are right based on the current statistics, in 2 more days we'll know if it replicated or not... :)

Thanks!

- abe

Jason Filler · Answer

Joe - question on your comment here 'In order for replication to function, your arrays have to be normally routed to each other, so they can each see not only the group IP but all of the eht IP addresses directly, without any translations.' Does that mean if I have the mgmt eth2 on both my SAN's on a different subnet and VLAN that can't talk to Eth0 and Eth1 that replication will not work?  I have a current setup that allows me to ping between both my EqualLogic's boxs from Eth0 and Eth1 to the other units Eth0 and Eth1 and Group IP addresses.  However, I can not get replication to work between them.  I keep getting 'No connection could be established. Verify that the partner IP address is correct'. More details - 4100E SANS - Firemware 6.0.1 Unit 1 Eth0 - 10.30.4.81 Eth1 - 10.30.4.82 Eth2 - 10.30.2.90 - mgmt Group IP 10.30.4.80   Unit 2 Eth0 - 10.31.4.81 Eth1 - 10.31.4.82 Eth2 - 10.31.2.90 - mgmt Gropu IP 10.31.4.80 Everyone can ping each other, example ping '-I 10.31.4.81 10.30.4.81' can ping fine including group IP's.  Mgmt ports can ping each other as well.  However, the networks on vlan'd and they can not talk between each other so 10.31.4.x can't talk to 10.31.2.x etc.  Is this a problem for me then?   Thanks, Jason

Jason Filler · Answer

One last thing, the DRC side is only a 100MB switch and the Production is at 1GB.  Should the EqualLogics be set at the DRC to forced 100 full instead of auto negotiate?

Joe S586 · Answer

A few things to verify:

1) Make sure you entered the Group IP and Group name correctly (the name is case sensitive).

2) Using just the ping (without the -I option, can you ping just the group IP (both ways)

3) Did you test the traceroute command too?

To traceroute out each of the specific ETH port interface on the array, telnet/ssh to the array and use the following:

GrpName>support traceroute -s [ETH port source IP] [ETH Destination IP]

(note, the word “support” precedes the command)

Do this for all combinations both ways (from site 1 then from site 2).

4) Is port 3260 open both ways on the router/firewall? Also, ensure that the router/firewall is open for the IP address associated with all your iSCSI interfaces (eth 0, 1 and the Group IP)?

5) Do you have a wan accelerator? If so, check the settings.

-joe

Jason Filler · Answer

thanks for your quick reply!

So support had told me (I have case open on this issue) that there is no way to do a traceroute - I am glad to find the command!

Here are the answers to your questions:

1) Make sure you entered the Group IP and Group name correctly (the name is case sensitive).

A - checked and re-checked about 10 times now (I have even deleted them and recreated them several times)

2) Using just the ping (without the -I option, can you ping just the group IP (both ways)

A - Yes

3) Did you test the traceroute command too?

To traceroute out each of the specific ETH port interface on the array, telnet/ssh to the array and use the following:

GrpName>support traceroute -s [ETH port source IP] [ETH Destination IP]

(note, the word “support” precedes the command)

Do this for all combinations both ways (from site 1 then from site 2).

A- Yes, I have no issues here - it takes 10ms and it has 2 hop's. I tried every combination possible as well.

4) Is port 3260 open both ways on the router/firewall? Also, ensure that the router/firewall is open for the IP address associated with all your iSCSI interfaces (eth 0, 1 and the Group IP)?

A- Yes - well there is no firewall between them, just a switch setup as a router between the 2 subnets. I will have a exagrid solution setup the same way and it can talk. I will login to just make sure its not configured wrong though.

5) Do you have a wan accelerator? If so, check the settings.

A- No

Thanks for you help, I feel like I am getting somewhere now. I did send in my DIAG's to support yesterday, but haven't heard back on what they found. The case# is SR 864974912- PS4100E

Jason Filler · Answer

Ok, performance aside.  Even if I get a 1GB switch in there (we are working on getting one) will that even matter on the issue I am having now?  If it does, I guess I need to stop working on this issue till we do that first...

Joe S586 · Answer

For replication you will find that this setup is not going to provide any kind of performance and most likely will not be usable (performance wise).

The requirements for the array eth connections (the iSCSI connections that host iSCSI traffic and replication is performed on) are (at a minimum) a 1GB switched network. For the management network since the management port is only 100MB’s you can use the switch for that.

You also need to ensure that the switches you use meet the approved hardware requirements:

en.community.dell.com/.../2661.equallogic-compatibility-matrix.aspx

-joe

Joe S586 · Answer

For setting up and testing the connectivity, you should be able to use it. -joe

EqualLogic

Was this post helpful?