This post is more than 5 years old
4 Posts
0
2086
VNX File Replication hasn't synced since November
I have a file system replication that hasn't synced since November. In poking around I noticed HDA3 on the control station was 100 utilized. I corrected that however the replication still has not synced. Most of my experience with VNX is when they are being used as Block only.
[root@qlcvnxcs01 tools]# nas_replicate -info qlctst_datastore_R
ID = 162_APM00XXXXXXXXX_2007_14418_APM00XXXXXXXXX_2007
Name = qlctst_datastore_R
Source Status = OK
Network Status = OK
Destination Status = OK
Last Sync Time = Tue Nov 24 02:17:29 EST 2015
Type = filesystem
Celerra Network Server = dubvnxcs01
Dart Interconnect = qlccs01-vnxnas04-repl
Peer Dart Interconnect = dubcs01-vnxnas04-repl
Replication Role = source
Source Filesystem = qlctst_datastore
Source Data Mover = server_2
Source Interface = 10.251.89.57
Source Control Port = 0
Source Current Data Port = 51093
Destination Filesystem = qlctst_datastore
Destination Data Mover = server_2
Destination Interface = 10.255.89.57
Destination Control Port = 5085
Destination Data Port = 8888
Max Out of Sync Time (minutes) = 10
Current Transfer Size (KB) = 55304
Current Transfer Remain (KB) = 55304
Estimated Completion Time =
Current Transfer is Full Copy = No
Current Transfer Rate (KB/s) = 0
Current Read Rate (KB/s) = 0
Current Write Rate (KB/s) = 0
Previous Transfer Rate (KB/s) = 297
Previous Read Rate (KB/s) = 19985
Previous Write Rate (KB/s) = 1171
Average Transfer Rate (KB/s) = 293
Average Read Rate (KB/s) = 15858
Average Write Rate (KB/s) = 1576
nas_cel -interconnect -v id=20003
qlccs01-vnxnas04-repl: has 1 source and 1 destination interface(s); validating - please wait...ok
1456325677: RCPT: 6: RcpTransportSender::openStream localip:10.251.89.57 localport:65279 handle :0x03330821c0
1456325677: RCPT: 6: RcpTransportSender::rcpclose() done on handle:0x03330821c0
1456325679: RCPT: 6: RcpTransportSender::openStream localip:10.251.89.57 localport:51864 handle :0x03330821c0
1456325679: REP: 6: DpCopierSender::setLatency TCP window is too small (285), setting it to twice msg size (524288)
1456325679: REP: 6: DpCopierSender::setLatency - RepSession=160_APM00XXXXXXXXXX_2007_14413_APM00XXXXXXXXXX_2007 set to 524288
1456325679: REP: 6: 31: Source=160_APM00121202958_2007_14413_APM00121202959_2007(alias=qlcprd_datastore_R), transferring. Data connection up.
chrismahon
18 Posts
0
February 24th, 2016 12:00
I would try to refresh the replication session from the source control station before deleting and recreating it. It shows the status as OK so I'm curious if a refresh would complete or fail.
nas_replicate -refresh qlctst_datastore_R -background
You can then verify the status of this task by looking at nas_task -l | head.
dynamox
2 Intern
2 Intern
•
20.4K Posts
2
February 24th, 2016 10:00
being out of sync for that long, you might have to delete that replication relationship and start fresh (will requite full filesystem resync). Have you tried to resume it ?
joe0121
4 Posts
0
February 24th, 2016 10:00
I used nas_replicate -start and it is currently syncing at 290 ish KB/s
Is resuming a separate command from -start? I didnt see resume in the cli user guide.
I am thinking the CS file system issue caused this replication to stop and once resolved it picket up again?
ID = 162_APM00121202958_2007_14418_APM00121202959_2007
Name = qlctst_datastore_R
Source Status = OK
Network Status = OK
Destination Status = OK
Last Sync Time =
Type = filesystem
Celerra Network Server = dubvnxcs01
Dart Interconnect = qlccs01-vnxnas04-repl
Peer Dart Interconnect = dubcs01-vnxnas04-repl
Replication Role = source
Source Filesystem = qlctst_datastore
Source Data Mover = server_2
Source Interface = 10.251.89.57
Source Control Port = 0
Source Current Data Port = 63760
Destination Filesystem = qlctst_datastore
Destination Data Mover = server_2
Destination Interface = 10.255.89.57
Destination Control Port = 5085
Destination Data Port = 8888
Max Out of Sync Time (minutes) = 10
Current Transfer Size (KB) = 12468368
Current Transfer Remain (KB) = 12372368
Estimated Completion Time = Thu Feb 25 00:01:23 EST 2016
Current Transfer is Full Copy = No
Current Transfer Rate (KB/s) = 294
Current Read Rate (KB/s) = 6768
Current Write Rate (KB/s) = 984
Previous Transfer Rate (KB/s) = 0
Previous Read Rate (KB/s) = 0
Previous Write Rate (KB/s) = 0
Average Transfer Rate (KB/s) = 0
Average Read Rate (KB/s) = 0
Average Write Rate (KB/s) = 0
also not sure how to make sense of these log entries
1456328528: SVFS: 6: Fs 6642 is not active
1456328531: SVFS: 6: last message repeated 9 times
1456328531: RCPT: 6: RcpTransportSender::openStream localip:10.251.89.57 localport:53636 handle :0x007b437bc0
1456328531: RCPT: 6: RcpTransportSender::rcpclose() done on handle:0x007b437bc0
1456328533: RCPT: 6: RcpTransportSender::openStream localip:10.251.89.57 localport:59409 handle :0x007b437bc0
1456328533: REP: 6: DpCopierSender::setLatency TCP window is too small (284), setting it to twice msg size (524288)
1456328533: REP: 6: DpCopierSender::setLatency - RepSession=160_APM00121202958_2007_14413_APM00121202959_2007 set to 524288
1456328533: REP: 6: 31: Source=160_APM00121202958_2007_14413_APM00121202959_2007(alias=qlcprd_datastore_R), transferring. Data connection up.
1456328537: RCPT: 6: RcpTransportSender::rcpclose() done on handle:0x007b437bc0
1456328538: REP: 6: 46: Scheduler=160_APM00121202958_2007_14413_APM00121202959_2007(a
joe0121
4 Posts
0
February 24th, 2016 13:00
I will wait for the sync to complete and try the refresh command as suggested. and likely do so on any replications with a sync time older than 24 hours.
All sessions showed ok from the beginning which is what had me so puzzled. This is a customers EMC that has it under break fix support so I don't know the system very well but judging by the replication name it is a QC datastore so it's possible is just didn't need to replicate till now?
It also makes me wonder why I am getting asked about it in February when the replication stopped in November.
My understanding is this is meant to replicate Data in real time with 1 VNX acting as a primary and another as DR It does it's replication over the network through the Data movers over what ever protocol you tell it to IE CIFS, ISCSI what ever.
The vast majority of my customers are Primarily using their Celerra's and VNX's as Block with Perhaps the odd replication job.
I just looked up EMC's VNX training and it's $5,000.00 I'll just have to stick to trial and error and google fu.
Thanks for the help I'll update the thread when I think we finally have it solved.
Rainer_EMC
8.6K Posts
1
February 25th, 2016 02:00
Well – a good start would be the documentation – esp. the Replicator PDF manual.
Then try reading the data mover logs with server_log
You could also install two Simulators for self-training.
I would also suggest to look into implementing monitoring – there should be events for replication – or maybe look at UniSphere Central
joe0121
4 Posts
0
February 25th, 2016 07:00
Thanks!
I didn't realize they had a simulator. Thats very helpful and I'll give it a try for sure.