No, iSCSI is running on VLAN 10. No WAN Accelerators. Replication site (~400 meter) is connected via 20Gbit LAG (two PC8024er stacks (one stack á 2 switches each site).
There are some strange things:
Some replications are working fine while others are run into "partner down". I have to delete the complete replica set and start over. Sometimes it works for a few replications until it ran into "partner down" again.
The replication destinations always shows that the replication with "partner down" was successfull.
As you can see (source site). Replication from 14:31 still "in progress" / replication status "partner down"
Destination site shows that this replication is completed:
Did you ever find a resolution to this? We are seeing something similiar.
Replication is working fine on particular volumes but on others are stopping with the message 'parnter-down' we have logged a call with Dell who at the moment are going through networking troubleshooting tasks but if some volumes are working it cant be a networking issue.
Interestingly if i cancel the replica creation and start it again it stops at EXACTLY the same point each time.
On the destination site must be a small amount (at least 20-30gb) space left for the incoming replication. I had configured all space of the array for delegated space. After reducing the delegated space by 20-30gb so that there is a small amount "free space" everything was fine.
You can't find this in the manual, it is written in the release notes of the V6 firmware. It took a while for EQL support to figure this out.
They say 5% or 100GB to 200GB right? of free pool space needed as best practice. Must have plenty of space to test failover of all volumes too. If HQ is RAW 14.4 TB, figure bare min. >=28.8TB at DR site to fire up cloned volumes leaving original replicas alone when using vmware SRM. I'd have DR site at 3X's more RAW capacity then HQ site, factor in dedicated SRM volumes, etc...
You need more raw space on recovery site as on primary. how much more depends on your restore points and replication schedule. But double the size as primary site is a good deal.
None of customers is using replication over wan link currently. Most use case is that customers have two datacenter on the same campus and replicate between both datacenters.
The application servers (mostly vmware hosts) are spreads in both datacenter. so in case of failover you just have to fire up the replication. No usecase of SRM
resource have always been listed in the firmware details for each version. Although the fix list is not a complete list and not all symptoms may be listed. 6.0.4 and later has address replication issues you should review.
A partner down message can also just mean that; you don't have great communication between the groups or there is too much going on.
you should review you logs for replication start times and replication complete times. You may need to spread out your replication schedules.
MarcelMertens
9 Posts
0
December 11th, 2012 07:00
Case is already open. But at the moment it doesn't look like that support gets the clou.
DCB is disabled on switch and eql
We don't know if the packet errors on management interface are causing the partner down error.
I change the switch, duplex settings, cable -> Still packet errors.
Also updated the replication site to 6.0.2. Still packet errors.
MarcelMertens
9 Posts
0
December 11th, 2012 08:00
DCB is disabled on EQL.
AFAIK is there no "DCB off" switch on PC8024. Firmware is 5.0.0.4
PFC (priority flow control) is inactive
MarcelMertens
9 Posts
0
December 11th, 2012 08:00
No, iSCSI is running on VLAN 10. No WAN Accelerators. Replication site (~400 meter) is connected via 20Gbit LAG (two PC8024er stacks (one stack á 2 switches each site).
There are some strange things:
Some replications are working fine while others are run into "partner down". I have to delete the complete replica set and start over. Sometimes it works for a few replications until it ran into "partner down" again.
The replication destinations always shows that the replication with "partner down" was successfull.
As you can see (source site). Replication from 14:31 still "in progress" / replication status "partner down"
Destination site shows that this replication is completed:
MarcelMertens
9 Posts
0
December 11th, 2012 09:00
iSCSI VLAN 10 is untagged to all storage ports.
See pictures:
LLDP is active for all storage ports.
innyinskip
1 Message
0
February 3rd, 2014 00:00
Hi Guys,
Did you ever find a resolution to this? We are seeing something similiar.
Replication is working fine on particular volumes but on others are stopping with the message 'parnter-down' we have logged a call with Dell who at the moment are going through networking troubleshooting tasks but if some volumes are working it cant be a networking issue.
Interestingly if i cancel the replica creation and start it again it stops at EXACTLY the same point each time.
MarcelMertens
9 Posts
0
February 3rd, 2014 01:00
Jep,
very simple (in our case):
On the destination site must be a small amount (at least 20-30gb) space left for the incoming replication. I had configured all space of the array for delegated space. After reducing the delegated space by 20-30gb so that there is a small amount "free space" everything was fine.
You can't find this in the manual, it is written in the release notes of the V6 firmware. It took a while for EQL support to figure this out.
I hope this will help you...
vmbru
58 Posts
0
February 3rd, 2014 18:00
They say 5% or 100GB to 200GB right? of free pool space needed as best practice. Must have plenty of space to test failover of all volumes too. If HQ is RAW 14.4 TB, figure bare min. >=28.8TB at DR site to fire up cloned volumes leaving original replicas alone when using vmware SRM. I'd have DR site at 3X's more RAW capacity then HQ site, factor in dedicated SRM volumes, etc...
MarcelMertens
9 Posts
0
February 3rd, 2014 22:00
You need more raw space on recovery site as on primary. how much more depends on your restore points and replication schedule. But double the size as primary site is a good deal.
None of customers is using replication over wan link currently. Most use case is that customers have two datacenter on the same campus and replicate between both datacenters.
The application servers (mostly vmware hosts) are spreads in both datacenter. so in case of failover you just have to fire up the replication. No usecase of SRM
wadet5k
24 Posts
0
February 6th, 2014 22:00
resource have always been listed in the firmware details for each version. Although the fix list is not a complete list and not all symptoms may be listed. 6.0.4 and later has address replication issues you should review.
A partner down message can also just mean that; you don't have great communication between the groups or there is too much going on.
you should review you logs for replication start times and replication complete times. You may need to spread out your replication schedules.