PowerFlex Management Platform - Cisco Nexus Switch Upgrade Fails Due to SSH Timeout
Summary: During the Cisco Nexus switch firmware upgrade, the process fails due to a timeout error while transferring the NX-OS image file over SSH.
Symptoms
Scenario
After initiating the Cisco Nexus switch upgrade, the system attempts to transfer the firmware file, as shown in the deployment logs below:
INFO [2024-10-08T11:41:43.149490] 315108: provider/base.rb:239:in `process!': Resources for cisconexus5k-fdoXXXXXXX:
{"asm::firmware"=>
{"cisconexus5k-fdoXXXXXXX"=>
{"asm_hostname"=>"10.10.26.16",
"decrypt"=>false,
"force_restart"=>true,
"http_password"=>"test",
"http_user"=>"test",
"install_type"=>"uri",
"path"=>
"https://test:test@10.10.10.15:443/httpshare/download/8aaa8037910dd23d01910f4a911b159c/nxos64-cs.10.4.2.F.bin",
"product"=>"cisconexus5k",
"server_firmware"=>
"[{\"instance_id\":null,\"component_id\":\"31148\",\"uri_path\":\"https://dellpowerflex.com:443/httpshare/download/8aaa8037910dd23d01910f4a911b159c/nxos64-cs.10.4.2.F.bin\",\"version\":null}]",
"version"=>"10.4(2)"}}}
During this transfer, the process runs for about 3–5 minutes before stalling, causing the connection to time out. The exception log captures the following details:
#<RuntimeError: env --unset=RUBYOPT --unset=GEM_HOME --unset=RUBYLIB --unset=GEM_PATH --unset=BUNDLE_BIN_PATH RUBYLIB=/opt/service/lib:/opt/asm-deployer/lib:/opt/puppetlabs/puppet/lib:/opt/dependencies/dell-asm-util/lib PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/puppetlabs/puppet/bin:/opt/puppetlabs/bin puppet asm process_node --debug --trace --filename /opt/Dell/ASM/deployments/Job-5c4d4f5b-5fb2-4948-9fe7-8ece57b4b2e7-2/resources/cisconexus5k-fdoXXXXXXX.yaml --run_type apply --statedir /opt/Dell/ASM/deployments/Job-5c4d4f5b-5fb2-4948-9fe7-8ece57b4b2e7-2/resources --always-override cisconexus5k-fdoXXXXXXX failed; output in /opt/Dell/ASM/deployments/Job-5c4d4f5b-5fb2-4948-9fe7-8ece57b4b2e7-2/cisconexus5k-fdoXXXXXXX.out>
At this stage, the upgrade fails, and the cisconexus5k-fdoXXXXXXX.out file shows the following error:
Debug: SSH send only: copy https://test:test@10.10.10.15:443/httpshare/download/8aaa8037910dd23d01910f4a911b159c/nxos64-cs.10.4.2.F.bin bootflash: vrf management
Error: execution expired
Error: /Stage[main]/Asm::Resource_wrapper/Asm::Firmware[cisconexus5k-fdoXXXXXX]/Cisconexus5k_firmwareupdate[firmware_update]/returns: change from to '#' failed: execution expired
Impact
Unable to upgrade Switches with PFxM.
Cause
Troubleshoot basic networking issues with commands such as:
From Switch:
ping <MVM-MGMT> packet-size 1500 count 1000 vrf management
copy https://X.X.X.X:443/httpshare/download/<PATH> bootflash: vrf management
show vrf
show ip route vrf management
traceroute
Component Details:
- Cisco Switch Device:
- Interface: mgmt0
- IP Address: 10.10.26.23/25
- VRF: Vrf management
- PowerFlex Management Platform Host:
- Multiple network interfaces:
- eth0: 10.10.10.23/25 (Management Network)
- eth1: 10.10.26.12/25 (OOB)
The primary issue is asymmetric routing on the MVMs:
- The Cisco Switches and PFMP hosts have interfaces within the 10.10.26.16/25 subnet(OOB Network).
- Routing is configured on the PFMP Management Network to route from the Management to the OOB.
- File transfer requests from Cisco Switch (10.10.26.23) arrived at PFMP1 using eth1 (10.10.26.12). Then the outgoing response leaves from eth0 (10.10.10.23).
- Due to the network configuration, including firewall settings, security policies, and network rules, the file transfer was stalling. This issue arose because requests were sent to PFMP1 on interface eth1, but the responses were routed back to the switch from PFMP1’s eth0 interface, causing the connection to drop.
Resolution
To resolve the issue, the following actions should be taken:
Disable the eth1 interface (NIC):
Power down the eth1 interface on the PFMPs:
ip link set dev eth1 down
Remove eth1 from the Network Configuration:
Delete or move the network configuration file for eth1 to prevent it from being brought up on reboot.
For example, on a Linux system:
mv /etc/sysconfig/network-scripts/ifcfg-eth1 /etc/sysconfig/network-scripts/ifcfg-eth1.bak
Additional Information
Impacted Version
PFMP 4.x
Fixed In Version
N/A - Working as expected