joe0121

4 Posts

2085

February 24th, 2016 07:00

VNX File Replication hasn't synced since November

I have a file system replication that hasn't synced since November. In poking around I noticed HDA3 on the control station was 100 utilized. I corrected that however the replication still has not synced. Most of my experience with VNX is when they are being used as Block only.

[root@qlcvnxcs01 tools]# nas_replicate -info qlctst_datastore_R

ID = 162_APM00XXXXXXXXX_2007_14418_APM00XXXXXXXXX_2007

Name = qlctst_datastore_R

Source Status = OK

Network Status = OK

Destination Status = OK

Last Sync Time = Tue Nov 24 02:17:29 EST 2015

Type = filesystem

Celerra Network Server = dubvnxcs01

Dart Interconnect = qlccs01-vnxnas04-repl

Peer Dart Interconnect = dubcs01-vnxnas04-repl

Replication Role = source

Source Filesystem = qlctst_datastore

Source Data Mover = server_2

Source Interface = 10.251.89.57

Source Control Port = 0

Source Current Data Port = 51093

Destination Filesystem = qlctst_datastore

Destination Data Mover = server_2

Destination Interface = 10.255.89.57

Destination Control Port = 5085

Destination Data Port = 8888

Max Out of Sync Time (minutes) = 10

Current Transfer Size (KB) = 55304

Current Transfer Remain (KB) = 55304

Estimated Completion Time =

Current Transfer is Full Copy = No

Current Transfer Rate (KB/s) = 0

Current Read Rate (KB/s) = 0

Current Write Rate (KB/s) = 0

Previous Transfer Rate (KB/s) = 297

Previous Read Rate (KB/s) = 19985

Previous Write Rate (KB/s) = 1171

Average Transfer Rate (KB/s) = 293

Average Read Rate (KB/s) = 15858

Average Write Rate (KB/s) = 1576

nas_cel -interconnect -v id=20003

qlccs01-vnxnas04-repl: has 1 source and 1 destination interface(s); validating - please wait...ok

1456325677: RCPT: 6: RcpTransportSender::openStream localip:10.251.89.57 localport:65279 handle :0x03330821c0

1456325677: RCPT: 6: RcpTransportSender::rcpclose() done on handle:0x03330821c0

1456325679: RCPT: 6: RcpTransportSender::openStream localip:10.251.89.57 localport:51864 handle :0x03330821c0

1456325679: REP: 6: DpCopierSender::setLatency TCP window is too small (285), setting it to twice msg size (524288)

1456325679: REP: 6: DpCopierSender::setLatency - RepSession=160_APM00XXXXXXXXXX_2007_14413_APM00XXXXXXXXXX_2007 set to 524288

1456325679: REP: 6: 31: Source=160_APM00121202958_2007_14413_APM00121202959_2007(alias=qlcprd_datastore_R), transferring. Data connection up.

Responses(6)

chrismahon

18 Posts

0

February 24th, 2016 12:00

I would try to refresh the replication session from the source control station before deleting and recreating it. It shows the status as OK so I'm curious if a refresh would complete or fail.

nas_replicate -refresh qlctst_datastore_R -background

You can then verify the status of this task by looking at nas_task -l | head.

dynamox

2 Intern

•

20.4K Posts

2

February 24th, 2016 10:00

being out of sync for that long, you might have to delete that replication relationship and start fresh (will requite full filesystem resync). Have you tried to resume it ?

J

joe0121

4 Posts

0

February 24th, 2016 10:00

I used nas_replicate -start and it is currently syncing at 290 ish KB/s

Is resuming a separate command from -start? I didnt see resume in the cli user guide.

I am thinking the CS file system issue caused this replication to stop and once resolved it picket up again?

ID = 162_APM00121202958_2007_14418_APM00121202959_2007

Name = qlctst_datastore_R

Source Status = OK

Network Status = OK

Destination Status = OK

Last Sync Time =

Type = filesystem

Celerra Network Server = dubvnxcs01

Dart Interconnect = qlccs01-vnxnas04-repl

Peer Dart Interconnect = dubcs01-vnxnas04-repl

Replication Role = source

Source Filesystem = qlctst_datastore

Source Data Mover = server_2

Source Interface = 10.251.89.57

Source Control Port = 0

Source Current Data Port = 63760

Destination Filesystem = qlctst_datastore

Destination Data Mover = server_2

Destination Interface = 10.255.89.57

Destination Control Port = 5085

Destination Data Port = 8888

Max Out of Sync Time (minutes) = 10

Current Transfer Size (KB) = 12468368

Current Transfer Remain (KB) = 12372368

Estimated Completion Time = Thu Feb 25 00:01:23 EST 2016

Current Transfer is Full Copy = No

Current Transfer Rate (KB/s) = 294

Current Read Rate (KB/s) = 6768

Current Write Rate (KB/s) = 984

Previous Transfer Rate (KB/s) = 0

Previous Read Rate (KB/s) = 0

Previous Write Rate (KB/s) = 0

Average Transfer Rate (KB/s) = 0

Average Read Rate (KB/s) = 0

Average Write Rate (KB/s) = 0

also not sure how to make sense of these log entries

1456328528: SVFS: 6: Fs 6642 is not active

1456328531: SVFS: 6: last message repeated 9 times

1456328531: RCPT: 6: RcpTransportSender::openStream localip:10.251.89.57 localport:53636 handle :0x007b437bc0

1456328531: RCPT: 6: RcpTransportSender::rcpclose() done on handle:0x007b437bc0

1456328533: RCPT: 6: RcpTransportSender::openStream localip:10.251.89.57 localport:59409 handle :0x007b437bc0

1456328533: REP: 6: DpCopierSender::setLatency TCP window is too small (284), setting it to twice msg size (524288)

1456328533: REP: 6: DpCopierSender::setLatency - RepSession=160_APM00121202958_2007_14413_APM00121202959_2007 set to 524288

1456328533: REP: 6: 31: Source=160_APM00121202958_2007_14413_APM00121202959_2007(alias=qlcprd_datastore_R), transferring. Data connection up.

1456328537: RCPT: 6: RcpTransportSender::rcpclose() done on handle:0x007b437bc0

1456328538: REP: 6: 46: Scheduler=160_APM00121202958_2007_14413_APM00121202959_2007(a

J

joe0121

4 Posts

0

February 24th, 2016 13:00

I will wait for the sync to complete and try the refresh command as suggested. and likely do so on any replications with a sync time older than 24 hours.

All sessions showed ok from the beginning which is what had me so puzzled. This is a customers EMC that has it under break fix support so I don't know the system very well but judging by the replication name it is a QC datastore so it's possible is just didn't need to replicate till now?

It also makes me wonder why I am getting asked about it in February when the replication stopped in November.

My understanding is this is meant to replicate Data in real time with 1 VNX acting as a primary and another as DR It does it's replication over the network through the Data movers over what ever protocol you tell it to IE CIFS, ISCSI what ever.

The vast majority of my customers are Primarily using their Celerra's and VNX's as Block with Perhaps the odd replication job.

I just looked up EMC's VNX training and it's $5,000.00 I'll just have to stick to trial and error and google fu.

Thanks for the help I'll update the thread when I think we finally have it solved.

R

Rainer_EMC

8.6K Posts

1

February 25th, 2016 02:00

Well – a good start would be the documentation – esp. the Replicator PDF manual.

Then try reading the data mover logs with server_log

You could also install two Simulators for self-training.

I would also suggest to look into implementing monitoring – there should be events for replication – or maybe look at UniSphere Central

J

joe0121

4 Posts

0

February 25th, 2016 07:00

Thanks!

I didn't realize they had a simulator. Thats very helpful and I'll give it a try for sure.

View All

No Events found!