Replication is not supported with NAT (may work sometimes, and other times will not work). What happens during an iSCSI login between the two partners is that initial connection goes to the group address, which returns a "target has temporarily moved" login failure and then redirects to a physical IP address of one of the eth interfaces on the receiving group.
The problem with NAT is that you can re-map the group address, but the login redirection is embedded in the iSCSI protocol. Unless your NAT knows how to snoop inside the iSCSI protocol and rewrite those addresses the group will be redirected to an IP address it cannot see, and replication will never complete.
In order for replication to function, your arrays have to be normally routed to each other, so they can each see not only the group IP but all of the eht IP addresses directly, without any translations.
On the B-side, verify that you see the replica of the 500GB volume and test it by cloning it to a volume. If that works, I'd say it was indeed successful.
When partnering 2 units for replication, be sure to also match the case of the partner name as this will prevent replication from working, but it only shows up (with a non-descriptive error message) after you try to actually replicate a volume.
The replica converted to a volume with no problems. I wish I had data in the 500GB to REALLY verify it, but everything went smoothly. I *think* it's working - it's just interesting in the WAN application and the NATting. I wish I had a way to trick that instead of having to accept the errors in my logs. We'll see - if my numbers are right based on the current statistics, in 2 more days we'll know if it replicated or not... :)
"In order for replication to function, your arrays have to be normally routed to each other, so they can each see not only the group IP but all of the eht IP addresses directly, without any translations."
Does that mean if I have the mgmt eth2 on both my SAN's on a different subnet and VLAN that can't talk to Eth0 and Eth1 that replication will not work? I have a current setup that allows me to ping between both my EqualLogic's boxs from Eth0 and Eth1 to the other units Eth0 and Eth1 and Group IP addresses. However, I can not get replication to work between them. I keep getting "No connection could be established. Verify that the partner IP address is correct".
More details - 4100E SANS - Firemware 6.0.1
Unit 1
Eth0 - 10.30.4.81
Eth1 - 10.30.4.82
Eth2 - 10.30.2.90 - mgmt
Group IP 10.30.4.80
Unit 2
Eth0 - 10.31.4.81
Eth1 - 10.31.4.82
Eth2 - 10.31.2.90 - mgmt
Gropu IP 10.31.4.80
Everyone can ping each other, example ping "-I 10.31.4.81 10.30.4.81" can ping fine including group IP's. Mgmt ports can ping each other as well. However, the networks on vlan'd and they can not talk between each other so 10.31.4.x can't talk to 10.31.2.x etc. Is this a problem for me then?
One last thing, the DRC side is only a 100MB switch and the Production is at 1GB. Should the EqualLogics be set at the DRC to forced 100 full instead of auto negotiate?
1) Make sure you entered the Group IP and Group name correctly (the name is case sensitive).
2) Using just the ping (without the -I option, can you ping just the group IP (both ways)
3) Did you test the traceroute command too?
To traceroute out each of the specific ETH port interface on the array, telnet/ssh to the array and use the following:
GrpName>support traceroute -s [ETH port source IP] [ETH Destination IP]
(note, the word “support” precedes the command)
Do this for all combinations both ways (from site 1 then from site 2).
4) Is port 3260 open both ways on the router/firewall? Also, ensure that the router/firewall is open for the IP address associated with all your iSCSI interfaces (eth 0, 1 and the Group IP)?
5) Do you have a wan accelerator? If so, check the settings.
So support had told me (I have case open on this issue) that there is no way to do a traceroute - I am glad to find the command!
Here are the answers to your questions:
1) Make sure you entered the Group IP and Group name correctly (the name is case sensitive).
A - checked and re-checked about 10 times now (I have even deleted them and recreated them several times)
2) Using just the ping (without the -I option, can you ping just the group IP (both ways)
A - Yes
3) Did you test the traceroute command too?
To traceroute out each of the specific ETH port interface on the array, telnet/ssh to the array and use the following:
GrpName>support traceroute -s [ETH port source IP] [ETH Destination IP]
(note, the word “support” precedes the command)
Do this for all combinations both ways (from site 1 then from site 2).
A- Yes, I have no issues here - it takes 10ms and it has 2 hop's. I tried every combination possible as well.
4) Is port 3260 open both ways on the router/firewall? Also, ensure that the router/firewall is open for the IP address associated with all your iSCSI interfaces (eth 0, 1 and the Group IP)?
A- Yes - well there is no firewall between them, just a switch setup as a router between the 2 subnets. I will have a exagrid solution setup the same way and it can talk. I will login to just make sure its not configured wrong though.
5) Do you have a wan accelerator? If so, check the settings.
A- No
Thanks for you help, I feel like I am getting somewhere now. I did send in my DIAG's to support yesterday, but haven't heard back on what they found. The case# is SR 864974912- PS4100E
Ok, performance aside. Even if I get a 1GB switch in there (we are working on getting one) will that even matter on the issue I am having now? If it does, I guess I need to stop working on this issue till we do that first...
For replication you will find that this setup is not going to provide any kind of performance and most likely will not be usable (performance wise).
The requirements for the array eth connections (the iSCSI connections that host iSCSI traffic and replication is performed on) are (at a minimum) a 1GB switched network. For the management network since the management port is only 100MB’s you can use the switch for that.
You also need to ensure that the switches you use meet the approved hardware requirements:
Joe S586
7 Technologist
•
729 Posts
1
June 15th, 2011 14:00
awliste,
Replication is not supported with NAT (may work sometimes, and other times will not work). What happens during an iSCSI login between the two partners is that initial connection goes to the group address, which returns a "target has temporarily moved" login failure and then redirects to a physical IP address of one of the eth interfaces on the receiving group.
The problem with NAT is that you can re-map the group address, but the login redirection is embedded in the iSCSI protocol. Unless your NAT knows how to snoop inside the iSCSI protocol and rewrite those addresses the group will be redirected to an IP address it cannot see, and replication will never complete.
In order for replication to function, your arrays have to be normally routed to each other, so they can each see not only the group IP but all of the eht IP addresses directly, without any translations.
Regards,
Joe
Dev Mgr
4 Operator
•
9.3K Posts
0
June 15th, 2011 09:00
On the B-side, verify that you see the replica of the 500GB volume and test it by cloning it to a volume. If that works, I'd say it was indeed successful.
When partnering 2 units for replication, be sure to also match the case of the partner name as this will prevent replication from working, but it only shows up (with a non-descriptive error message) after you try to actually replicate a volume.
awliste
5 Posts
0
June 15th, 2011 10:00
The replica converted to a volume with no problems. I wish I had data in the 500GB to REALLY verify it, but everything went smoothly. I *think* it's working - it's just interesting in the WAN application and the NATting. I wish I had a way to trick that instead of having to accept the errors in my logs. We'll see - if my numbers are right based on the current statistics, in 2 more days we'll know if it replicated or not... :)
Thanks!
- abe
Jason Filler
5 Posts
0
October 3rd, 2012 12:00
Joe - question on your comment here
"In order for replication to function, your arrays have to be normally routed to each other, so they can each see not only the group IP but all of the eht IP addresses directly, without any translations."
Does that mean if I have the mgmt eth2 on both my SAN's on a different subnet and VLAN that can't talk to Eth0 and Eth1 that replication will not work? I have a current setup that allows me to ping between both my EqualLogic's boxs from Eth0 and Eth1 to the other units Eth0 and Eth1 and Group IP addresses. However, I can not get replication to work between them. I keep getting "No connection could be established. Verify that the partner IP address is correct".
More details - 4100E SANS - Firemware 6.0.1
Unit 1
Eth0 - 10.30.4.81
Eth1 - 10.30.4.82
Eth2 - 10.30.2.90 - mgmt
Group IP 10.30.4.80
Unit 2
Eth0 - 10.31.4.81
Eth1 - 10.31.4.82
Eth2 - 10.31.2.90 - mgmt
Gropu IP 10.31.4.80
Everyone can ping each other, example ping "-I 10.31.4.81 10.30.4.81" can ping fine including group IP's. Mgmt ports can ping each other as well. However, the networks on vlan'd and they can not talk between each other so 10.31.4.x can't talk to 10.31.2.x etc. Is this a problem for me then?
Thanks,
Jason
Jason Filler
5 Posts
0
October 3rd, 2012 13:00
One last thing, the DRC side is only a 100MB switch and the Production is at 1GB. Should the EqualLogics be set at the DRC to forced 100 full instead of auto negotiate?
Joe S586
7 Technologist
•
729 Posts
0
October 3rd, 2012 13:00
A few things to verify:
1) Make sure you entered the Group IP and Group name correctly (the name is case sensitive).
2) Using just the ping (without the -I option, can you ping just the group IP (both ways)
3) Did you test the traceroute command too?
To traceroute out each of the specific ETH port interface on the array, telnet/ssh to the array and use the following:
GrpName>support traceroute -s [ETH port source IP] [ETH Destination IP]
(note, the word “support” precedes the command)
Do this for all combinations both ways (from site 1 then from site 2).
4) Is port 3260 open both ways on the router/firewall? Also, ensure that the router/firewall is open for the IP address associated with all your iSCSI interfaces (eth 0, 1 and the Group IP)?
5) Do you have a wan accelerator? If so, check the settings.
-joe
Jason Filler
5 Posts
0
October 3rd, 2012 13:00
thanks for your quick reply!
So support had told me (I have case open on this issue) that there is no way to do a traceroute - I am glad to find the command!
Here are the answers to your questions:
1) Make sure you entered the Group IP and Group name correctly (the name is case sensitive).
A - checked and re-checked about 10 times now (I have even deleted them and recreated them several times)
2) Using just the ping (without the -I option, can you ping just the group IP (both ways)
A - Yes
3) Did you test the traceroute command too?
To traceroute out each of the specific ETH port interface on the array, telnet/ssh to the array and use the following:
GrpName>support traceroute -s [ETH port source IP] [ETH Destination IP]
(note, the word “support” precedes the command)
Do this for all combinations both ways (from site 1 then from site 2).
A- Yes, I have no issues here - it takes 10ms and it has 2 hop's. I tried every combination possible as well.
4) Is port 3260 open both ways on the router/firewall? Also, ensure that the router/firewall is open for the IP address associated with all your iSCSI interfaces (eth 0, 1 and the Group IP)?
A- Yes - well there is no firewall between them, just a switch setup as a router between the 2 subnets. I will have a exagrid solution setup the same way and it can talk. I will login to just make sure its not configured wrong though.
5) Do you have a wan accelerator? If so, check the settings.
A- No
Thanks for you help, I feel like I am getting somewhere now. I did send in my DIAG's to support yesterday, but haven't heard back on what they found. The case# is SR 864974912- PS4100E
Jason Filler
5 Posts
0
October 3rd, 2012 14:00
Ok, performance aside. Even if I get a 1GB switch in there (we are working on getting one) will that even matter on the issue I am having now? If it does, I guess I need to stop working on this issue till we do that first...
Joe S586
7 Technologist
•
729 Posts
0
October 3rd, 2012 14:00
For replication you will find that this setup is not going to provide any kind of performance and most likely will not be usable (performance wise).
The requirements for the array eth connections (the iSCSI connections that host iSCSI traffic and replication is performed on) are (at a minimum) a 1GB switched network. For the management network since the management port is only 100MB’s you can use the switch for that.
You also need to ensure that the switches you use meet the approved hardware requirements:
en.community.dell.com/.../2661.equallogic-compatibility-matrix.aspx
-joe
Joe S586
7 Technologist
•
729 Posts
0
October 4th, 2012 05:00
For setting up and testing the connectivity, you should be able to use it.
-joe