Data Domain - After HA failover, veth alias interface fails to come up on alternate node
Zusammenfassung: DD HA node rebooted after panic or issue; failover occurred but interconnect and veth alias interface did not come up on alternate node.
Symptome
1. Vethx:x on node1 was not assigned an IP address after failover. Ha.log shows the message "can't set the netmask 255.255.x.x for vethx:x, error (99) Cannot assign requested address" but it ended with "Interface, veth, configuration complete, err = 0". This seems to be a conflicting message in DD plib.
2. Vethx:x on node1 was up but not running after HA failover. Kern.log shows bonding interfaces ethXx, and ethXx were both up.
Ursache
The issue occurs when assigning a netmask to an interface fails because the interface does not yet have an associated IP address.
If no address is associated with the interface, assigning one during startup takes time, and this delay is longer if no previous address existed.
If during this "finite time" the netmask is set in the kernel fails to match the netmask to the address, and will return an error. Receiving an error the SMS code puts down the interface, removing the address.
A fix was added for this issue:
If a 99 error is returned, it waits a second and tries to read the address. If it reads the address, the netmask is set again.
If it does not read the address or it gets another error trying to set the netmask, then it returns that error.
In the system, there is no new error logged after netmask error. The log also indicates that the configuration is complete.
05/03 06:04:24 NOTICE: dd_plib_net_setup(): Interface, vethx:x, configuration complete, err = 0.
Ideally, this would mean that the program could configure netmask successfully after reading the IP address, but the IP address was still not configured on vethx:x.
Issue reproduced in lab by engineering existing code path does not reassign IP address to floating interfaces (after HA failover) when there is a netmask error.
Lösung
To temporarily, restore connectivity to the vethx.x alias interface disable and enable the interface.
Issue is Fixed DDOS Version/s: DDOS-7.10.0.0, DDOS-7.11.0.0, DDOS-7.7.5.0