Unsolved
This post is more than 5 years old
13 Posts
1
2680
January 28th, 2014 22:00
RPAs lose IP Connectivity after assigning temporary IPs
Hi,
i'm currently facing an issue while trying to deploy 2 RPA-Clusters. After i assigned a temporary IP-Address on each of the 4 Appliances i got, two of them lose their connectivity again after a while (time Frame in which this happens is unreproducible, but it happens).
After i connected directly to them using the WAN-Port and the boxmgmt-account and assigned the customer provided IP-Addresses for the LAN-Interfaces, i disconnected my Laptop and patched both WAN- and LAN-Interfaces to the customers' Network. I verified that they reply back when i ping them.
Then all of a sudden i get a timeout and have to re-run the process. Sometimes it then works again for a while, but i also had to Change the Switchport that the LAN-Port connects to once or twice in order to get it running. Seems to be somehow related to the customers' Ethernet, but this is just a suspicion that i have no proof for, since the ports that i used are identically configured and all on the same subnet/VLAN.
The worst is that one of the RPAs vanished from the Network while i was deploying a Cluster using DM at the stage where i upgrade the RPA-Version to latest (4.0 SP2 ISO from a local drive). Now the RPA isn't even talking anymore when i connect to the WAN-Port using the 10.77.77.77 address. Seems that it's stuck in an undefined stated.
I connected a Display to the VGA-Port and also a Keyboard to USB to see what happens during boot (rebooted it a couple times, hoping that it's gonna come back), i only see one "Failed" during boot, but couldn't determine what exactly it was that failed (went away too quickly). After booting, i get a Login prompt where i'm unable to Login with the boxmgmt credentials.
I'll be opening a ticket anyway, maybe the RPA must be replaced.
But i'm hoping that anyone has seen this before so i can get rid of the issue and go ahead myself.
Cheers,
Wolfi


jhaynes2004
18 Posts
0
January 29th, 2014 20:00
That is an interesting issue. Does the box reboot intermittently at all during this process?
Also, is the customer doing some sort of MAC filtering on the network and is blocking these devices?
WHeintel
13 Posts
1
January 29th, 2014 23:00
Not that the customer would be aware of something like this. But i must add that their network is basically under the care of an external service Provider and they do have limited knowledge of the internals. We managed to find our way to Access the Cisco Management Tools for the Switches to check how the ports are configured or to spot any Special conditions. There weren't any at the first glance. For testing purposes, we disabled and reenabled the LAN-ports of the RPAs that don't lose connectivity. They reply before disabling, timeout when disabled and immediately reply again after reenabling just like it should be. Disabling the ports of the affected RPAs and reanabling doesn't render anything. All Settings are identical, and again: same subnet, same VLAN, so no Gateway or Routing issues.
As with the RPA that went down while applying 4.0 SP2 ISO during Deployment Manager, this might be a different (but related) issue: while applying the new RP-Version, RPAs do reboot, that's normal (dunno though how often, think it's twice). So losing ping-reply then is also normal. But not to come back isn't. Or to come back (which i can tell from attaching the VGA-Display) but to not carry ANY IP-address anymore (not even the 10.77.77.77 on the WAN-interface) isn't.
I opened up a ticket with EMC and the supporter confirmed the idea that i had had upfront anyway to burn back the extracted Contents of the 4.0 SP2 ISO to a DVD. The Image is bootable (you don't have to do anything in Terms of changing boot sequence in the BIOS for that) and should reinstall the box factory default-wise. It actually DID reinstall the box and the first 2 reboots looked normal (different stuff happened like configuring Hardware and a couple Management CLIs (symcli etc.) were installed during these two reboots). But then the System went into an endless boot cycle. It would boot up to the Login prompt (localdomain.localhost), where i can only Login using the root credentials. If you don't do anything, the Login prompt stays for like 10 seconds, then the box boots again. If you interrupt this by hitting a key, so as if you would wanna Login, it stays there without rebooting.
Today i will be brought in touch with somebody that can provide the root credentials, maybe we can figure how to manually heal things (like configuring 10.77.77.77 for WAN etc.). If it don't work, the box will re replaced.
Silly huh?
jhaynes2004
18 Posts
0
February 5th, 2014 18:00
I had something strange happen somewhat similar this week. I had one of the Recoverpoint (RPA-2 - this in a SE cluster) assign an arbitrary IP address to the eth1 port after setting a temporary IP. We configured the temp IPs, started the wizard and then left about halfway through the process. We reconvened the next morning and I basically restarted the wizard with the configuration file. Anyways, it came to be that RPA-2 LAN port was changed and I could not access it via the 10.77.77.77 via the WAN port. I simply popped back in the DVD, rebooted and started over fresh.
It came to be that our static IP addresses were being used by VMs that were assigned those IPs via DHCP. The customer had an IP conflict (verified by looking at the MAC address tables and MACs using an IP).
The odd thing was that somehow 10.77.77.77 was unusable after sometime, strange.
Double check their network and seeing if there is a potential IP conflict.