Data Domain: Troubleshooting Interfaces Down or Intermittent for Users
Summary: The primary focus for this article is to troubleshoot an interface down, multiple interfaces down, an intermittent interface, and a bad network card. Replacing parts should be the last step in resolving interface down issues. Troubleshooting these issues first such as reviewing the switch side, may lead to the quickest resolution. ...
Instructions
- Determine if the issue is with a single interface, multiple interfaces, an intermittent interface, or a network card.
- Identify whether the interface is down, intermittent, not responding or a network card has an issue.
- Link Status No is an interface that has no electrical signal and down.
- Intermittent is occurring at irregular intervals; not continuous or steady.
- To check the status of the hardware interfaces on the data domain run the command.
# net show hardware
- To check the current alerts run command.
# alerts show current
- To check the history of alerts run command.
# alerts show history
- Check net show settings to see if the interface that is down is part of a virtual bond, vlan, or alias.
# net show settings
NOTE: In order for an interface to have a link light and go running, it must be configured with an ip or in a bond. - If after a DDOS upgrade an interface is down, then it is highly unlikely it is due to a hardware failure.
- If the interface that is down is not in a bond then, you can disable it and enable it with the following commands.
# net disable ethXx # net enable ethXx
- It is highly recommended before opening a case to Dell Support that you check with your network team to check switch configuration.
- During an upgrade, interfaces disconnect from the network and switch sometimes timeout after receiving no response from dd interfaces and disable their ports.
- If the switch ports have timed out and disabled, then data domain support cannot help to fix this, and you must contact your network team to enable them.
- If you have on-site personals at the data center, then have a field engineer go on site and reseat the SFP or cable from both dd side and patch panel or switch side.
- If you configured a new interface on the data domain but it does not go up and running, then check to ensure there is a cable connected to the patch panel or switch side.
- If setting up a data domain for the first time and interfaces do not go up and running, ensure that the switch side port configuration is correct in addition to speed setting.
- For 10g interfaces on the data domain, the switch should also be set to 10g speed as well.
- For 1g speeds on the data domain, if the switch is set to 10g, it does not work it maybe more compatible for autonegotiate on switch.
- If setting up a data domain for the first time with fiber or DA copper interfaces, ensure you have compatible SFPs and cables.
- If after a reboot or a system upgrade and current alerts report MissingSlave and interfaces are down then rebooted the data domain again.
Examplep0-96 Tue Oct 24 16:47:52 2023 CRITICAL Network MissingSlave=veth0_eth1a EVT-NETM-00012: Network interface eth1a is missing. This is a physical interface for veth0.
- This issue is either a hardware failure or false alert due to race condition in which certain components and layers came online in incorrect order.
- If after the second reboot, a MissingSlave alert does not clear and the interfaces are still down, open a case with the Data Domain hardware support.
- To retrieve the output of the type of network card installed, run the following commands:
# system show hardware # enclosure show io-cards
- To view past auto supports to compare against current status, login to the UI and go to Maintenance > Support > Autosupport reports.
There you see the past 14 auto supports to download and open with notepad ++ - If the interface has suddenly gone down and not from a reboot or an upgrade, it is also possible that it can be a bad cable or SFP.
NOTE: Dell hardware support does not replace cables and SFPs on the customer's patch panel switch side. It is the customer's data center responsibility to replace cables and SFPs on the data center side.
- Reach out to your data center personnel to reseat the cable or SFP.
- If reseating still shows no link light, have your data center replace the cable and if fiber SFP on the switch side.
- If replacing cable and fiber SFP, then have your data center check the port on patch panel or switch and try a different port.
- If all the steps followed and still does not go up and running, then open a dell hardware case to replace SFP on dd side.
- Troubleshooting steps if the interface that is down is part of a virtual bond.
- If the interface did not come up after an upgrade or a reboot, check with your network team on the switch port side to try to disable and enable.
- If your network team checked the switch port and disabled and enabled the port then you can also try on the data domain side.
- On the data domain side, you can disable the virtual bond and reenable with the following commands.
# net disable vethXx # net enable vethXx
- You can remove the interface from the bond and then add it back to the bond. This can be done from the UI or command line.
#net aggregate del vethX interface ethXx #net aggregate add vethX interface ethXx
#net failover del vethX interfaces ethXx #net failover add vethX interfaces ethXx
- If removing and adding the interface did not resolve the issue, then try to destroy the virtual bond and re-create it.
NOTE: Use caution when destroying the whole bond you must always ensure you have redundancy and are ssh into another connection besides the bond.This can be done from the UI which is easier.
If you are uncomfortable destroying the virtual bond, and it is your only connection to the data domain, then do not proceed.
Data Domain - Configuring physical interfaces with Graphical User Interface (UI) - If destroying and re-creating the virtual bond did not bring up the interface or interfaces, then destroy again and assign IP addresses to the interface.
If assigning an IP address to the interface and it goes up and running, then most likely the switch side is not configured correctly, and this is out of the scope of data domain support and best to contact your network team. - Troubleshooting steps if the interface is still down.
- Reach out to your data center personnel to reseat the cable or SFP.
- If reseating still shows no link light, have your data center replace the cable and if fiber SFP on the switch side.
- If replacing cable and fiber SFP, then have your data center check the port on patch panel or switch and try a different port.
- If all the steps followed and still does not go up and running, then open a dell hardware case to replace SFP on dd side.
NOTE: Dell hardware support does not replace cables and SFPs on the customer's patch panel switch side. It is the customer's data center responsibility to replace cables and SFPs on the data center side.
- If you have on-site field personnel at the data center, the following are troubleshooting steps to resolve by field personal.
- Verify that the cable is securely connected at the Data Domain and back to the switch port or patch panel port.
- If possible, trace the cable to ensure it is connected to the correct port on the back of the Data Domain.
- Verify if there is a link light on the switch port or patch panel port.
- If there is no link light on the Data Domain port, reseat the cable.
- If there is no link light on the back of patch panel port or switch port, then reseat the cable.
- If there is no link after reseating on Data Domain side and patch panel port or switch port side, then replace the cable.
- If field personnel are on-site, you can also instruct them to swap a working interface cable with the interface port that is down on Data Domain.
After field personal swaps, they should see a link light, and on the Data Domain the net show hard should show link yes.# net show hardware
- If swapping shows a link light then the issue must be a bad SFP on Data Domain, bad cable, bad SFP on switch side or bad port on patch panel switch side.
- Another troubleshooting step for data center personnel is to do a loopback test.
- If there is a free unconfigured interface on the Data Domain that is the same type then loop it from one interface to another.
For example, if eth1a is copper, and eth4a is copper but is not being used then connect eth1a to eth4a.
eth4a has to be configured with a dummy IP address which you can put anything such as# net config eth4a 1.2.3.4 netmask 255.255.255.0
After you are finished with the look back test, clear the configuration for the dummy ip# net config eth4a 0.0.0.0
- If the loopback test did not work, and SFPs or cables or both have been replaced, then open a case with the Data Domain hardware support team.
- If there is a free unconfigured interface on the Data Domain that is the same type then loop it from one interface to another.
- Verify that the cable is securely connected at the Data Domain and back to the switch port or patch panel port.
-
Troubleshooting physical Fiber interfaces and direct attached copper interfaces.
The examples below may not look exactly the same as your output.
Example of output of a physical Fiber interface.sysadmin@dd3300-ddsupport# net show hardware Port Speed Duplex Supp Speeds Hardware Address Physical Link Status State Autonegotiation ----- -------- ------- ----------- ----------------- -------- ----------- ------- --------------- eth1b unknown unknown 1000/10000 00:0c:29:46:fc:1b Fiber no up on ----- -------- ------- ----------- ----------------- -------- ----------- ------- ---------------
Example of output of a physical DA Copper interface.sysadmin@ddsupport# net show hardware Net Show Hardware ----------------- Port Speed Duplex Supp Speeds Hardware Address Physical Link Status ----- ------- ------- ----------- ----------------- --------- ----------- ------- eth8a unknown unknown 25000 34:80:0d:94:70:52 DA Copper no up ----- ------- ------- ----------- ----------------- --------- ----------- -------
In most customer environments, fiber interfaces have three components; an SFP on the Data Domain side, an optical cable, and an SFP on the customer patch panel or switch side.
However some customers as an example have the Data Domain connected to a patch panel then a breakout cable, which then connects to a QSFP which has other connections.
Sometimes you have to inquire about the customer's connectivity because sometimes it may not be an issue from the Data Domain to the patch panel or switch side but other components or connections after that can impact the interface. -
If after upgrading, an alert generates that there is a speed mismatch on interfaces, check the following information.
Example
Id Post Time Severity Class Object Message ------ ------------------------ -------- ------- ----------------------------- -------------------------------------------------------------------------------- p0-618 Tue Oct 20 09:50:53 2023 CRITICAL Network Bonded Interface Name=veth1 EVT-NETM-00015: One or more interfaces in the bonded group has a speed mismatch. ------ ------------------------ -------- ------- ----------------------------- --------------------------------------------------------------------------------
What this could mean is that the speed for one interface in a bond is set to 100mbs while another interface is set to 1000mbs.
This can happen for several reasons which may not have anything to do with the data domain.- A faulty Ethernet cable
- A faulty port on the patch panel or switch side
- The switch port configuration is limiting the speed.
- The switch speed has a limitation.
- The interface that is in the virtual bond is not the correct interface.
- The interface is connected to the wrong switch or switch port.
Here is what you can do to troubleshoot further.
These steps can also be done with a UI.
Data Domain - Configuring physical interfaces with Graphical User Interface (UI)- To check the status of the hardware interfaces on the Data Domain run the command
# net show hardware
- To check the current alerts, run command
# alerts show current
- Check net show settings to see if the interface that is down is part of a virtual VLAN, or alias.
# net show settings
- If after a DDOS upgrade an interface is down, then it is highly unlikely it is due to a hardware failure.
- To check the status of the hardware interfaces on the Data Domain run the command
# net disable ethXx # net enable ethXx
- The interface may have been mismatched previously but only alerts now after an upgrade or a reboot and you can check the history
# alerts show history
- Check with your network team on the switch configuration and reconfigure switch speed or reconfigure autonegotiate.
- Disable and enable the switch port.
- Check LLDP information in the autosupport to see if the correct interfaces are in the LACP bond if LACP is being used.
- Remove the interface from the bond
# net failover del vethX interfaces ethXx
- Config the interface with a dummy IP address
# net config ethXx 1.2.3.4 netmask 255.255.255.0
- Try to change the speed manually
NOTE: Depending on the interface and Data Domain model, you cannot set the speed.
#net config ethXx duplex full speed 1000
- After trying the previous steps to clear the dummy IP, run the following command.
#net config ethXx 0.0.0.0
- Add it back to the bond
# net failover add vethX interfaces ethXx
- If the issue is still not fixed, then we recommend replacing the cable and check switch side again.
A list of reasons why a physical interface on a system may go down:
A physical interface on a system can go down due to various reasons, ranging from hardware issues to network problems. Here is a list of common reasons:
- Physical Cable Disconnection: The cable connecting the interface to the network or another device might be physically disconnected, causing the interface to go down.
- Wrong interface: The cable is connected to the wrong interface on the Data Domain side.
- Wrong interface: The cable is connected to the wrong interface on the patch panel and switch side.
- Hardware Failure: Faulty hardware components, such as network interface cards (NICs), switches, routers, or cables can lead to interface downtime.
- Power Issues: Power fluctuations, outages, or inadequate power supply to the networking equipment can cause interfaces to go down.
- Overheating: Excessive heat can damage hardware components, leading to interface failures and system shutdowns.
- Network Congestion: High levels of network traffic or congestion can overwhelm the interface causing it to become unresponsive or go down.
- Software Errors: Issues, glitches, or incompatibility issues within the device drivers, firmware, or operating system can result in interface failures.
- Configuration Errors: Incorrect network configurations, such as IP address conflicts or incorrect subnet masks, can render an interface inaccessible.
- Security Measures: Security policies, such as intrusion detection or firewall rules, might inadvertently block or restrict traffic through the interface.
- Physical Damage: Physical damage to the hardware, such as water exposure, impact, or wear and tear, can lead to interface failures.
- Environmental Factors: Extreme temperature, humidity, dust, or other environmental factors can impact the functionality of the hardware and cause interfaces to go down.
- Firmware or Software Updates: Incorrectly applied firmware or software updates can cause instability and lead to interface failures.
- Network Attacks: Denial of Service (DoS) attacks, Distributed Denial of Service (DDoS) attacks, or other malicious activities can overload the interface and cause it to fail.
- Routing Issues: Incorrect routing table entries or issues with dynamic routing protocols can disrupt connectivity through the interface.
- Physical Interface Configuration: Incorrect speed and duplex settings, autonegotiation problems, or mismatched configurations between connected devices can result in interface downtime.
- Cable Quality: Poor-quality or damaged cables can lead to intermittent connectivity or complete interface failures.
- Switch or Router Failures: Failures in networking equipment like switches or routers that connect to the interface can lead to downstream interface issues.
- Network Provider Problems: If the system connects to an external network provider, issues on their end (maintenance, outages, configuration errors) can cause the interface to go down.
- Resource Exhaustion: Insufficient memory or processing power in the system can lead to interface failures, especially in high-traffic scenarios.
- Physical Port Disablement: Manual or automated actions to disable the physical port by an administrator, network management system, or security policy.
- Fiber Optic Signal Loss: In fiber optic connections, issues like signal loss due to bending, contamination, or breakage can cause the interface to go down.
Data Domain - Configuring physical interfaces with Graphical User Interface (UI)
Data Domain - Configuring physical interfaces through command-line interface (CLI)
Additional Information
Refer to this video:
Troubleshooting Data Domain Network Interfaces
Duration: 00:03:07 (hh:mm:ss)
When available, closed caption (subtitles) language settings can be chosen using the CC icon on this video player.
You can also view this video on YouTube.