PowerFlex Manager - SO Node Update Fails in Function update_clc_node_agent
Summary: PFxM fails to upgrade the Storage Only (SO) node during the 'update_clc_node_agent' function, causing the upgrade operation to halt when attempting to place the SDS into PMM.
Symptoms
Scenario
- Environment: High-Available (HA) CloudLink Center Appliances
- Issue: SO resource group lists only one of the two expected CLC VMs.
- Symptom: Upgrading the SO nodes fails, citing that the node is not in PMM.
This scenario represents the trigger point for the upgrade failure. Below is an example of how a healthy stack should appear when running the update_clc_node_agent function:
Healthy Stack Example:
Log Location: Job-afe400aa-d7fe-4897-9a04-fe08b924c4ae-0-1/deployment.logs
DEBUG [2024-12-16T11:20:36.199529] 13742: service_deployment.rb:5348:in `process_firmware_update': Processing firmware update after selecting resources DEBUG [2024-12-16T11:20:36.200310] 13742: service_deployment.rb:5353:in `block in process_firmware_update': Processing firmware update on rackserver-xxxxxxx INFO [2024-12-16T11:20:36.201536] 13742: service_deployment.rb:5363:in `block in process_firmware_update': Updating CLC Agent update on vmcl01-esxi08.dell.lab DEBUG [2024-12-16T11:20:36.201933] 13742: service_deployment.rb:5365:in `block in process_firmware_update': Updating CLC Agent version on node svm-vmcl01-esxi08 DEBUG [2024-12-16T11:20:36.202379] 13742: type/base.rb:412:in `delegate': service_deployment.rb:5366:in `block in process_firmware_update' calling delegated method update_clc_node_agent on # DEBUG [2024-12-16T11:20:36.204979] 13742: type/base.rb:412:in `delegate': cloudlinkcenter.rb:205:in `clc_agent_info' calling delegated method os_connect_ip on # DEBUG [2024-12-16T11:20:38.054169] 13742: type/base.rb:412:in `delegate': cloudlinkcenter.rb:742:in `update_clc_node_agent' calling delegated method os_connect_ip on # DEBUG [2024-12-16T11:20:38.760221] 13742: provider/cloudlink/cloudlinkcenter.rb:747:in `update_clc_node_agent': clc-10.10.30.20: CLC Server and agent are running on same version 7.1 (build 140) INFO [2024-12-16T11:20:38.760840] 13742: service_deployment.rb:5367:in `block in process_firmware_update': Competed CLC agent update on vmcl01-esxi08.dell.lab
In contrast, the unhealthy stack displays the following error:NoMethodError: undefined method '[]' for nil:NilClass
Log Location: Job-afe400aa-d7fe-4897-9a04-fe08b924c4ae-0-1/deployment.logs
DEBUG [2024-12-19T13:35:48.462150] 19552: service_deployment.rb:5348:in `process_firmware_update': Processing firmware update after selecting resources DEBUG [2024-12-19T13:35:48.462349] 19552: service_deployment.rb:5353:in `block in process_firmware_update': Processing firmware update on rackserver-xxxxxxx INFO [2024-12-19T13:35:48.463044] 19552: service_deployment.rb:5363:in `block in process_firmware_update': Updating CLC Agent update on PFSON04 DEBUG [2024-12-19T13:35:48.463276] 19552: service_deployment.rb:5365:in `block in process_firmware_update': Updating CLC Agent version on node PFSON04 DEBUG [2024-12-19T13:35:48.463622] 19552: type/base.rb:412:in `delegate': service_deployment.rb:5366:in `block in process_firmware_update' calling delegated method update_clc_node_agent on # DEBUG [2024-12-19T13:35:48.466045] 19552: type/base.rb:412:in `delegate': cloudlinkcenter.rb:205:in `clc_agent_info' calling delegated method os_connect_ip on # DEBUG [2024-12-19T13:35:51.089302] 19552: type/base.rb:412:in `delegate': cloudlinkcenter.rb:742:in `update_clc_node_agent' calling delegated method os_connect_ip on # ERROR [2024-12-19T13:35:51.093230] 19552: service_deployment.rb:5535:in `process_firmware_update': Encountered an error during firmware update: NoMethodError: undefined method `[]' for nil:NilClass
Also, the Upgrade Job logs capture the precise moment the task fails:
Log Location: Job-afe400aa-d7fe-4897-9a04-fe08b924c4ae-0-1/deployment.logs
DEBUG [2024-12-19T13:37:23.210005] 19552: service_deployment.rb:6485:in `finalize_firmware_update': Update complete: false, in protected maintenance mode false ERROR [2024-12-19T13:37:23.210184] 19552: service_deployment.rb:6491:in `finalize_firmware_update': Failed to update the server! INFO [2024-12-19T13:37:23.210321] 19552: service_deployment.rb:6496:in `finalize_firmware_update': Firmware update status: Error ERROR [2024-12-19T13:37:23.216294] 19552: service_deployment.rb:622:in `process': Firmware update failed for Job-afe400aa-d7fe-4897-9a04-fe08b924c4ae-0-2 ERROR [2024-12-19T13:37:23.216535] 19552: service_deployment.rb:623:in `process': ["/opt/asm-deployer/lib/asm/service_deployment.rb:6500:in `finalize_firmware_update'", "/opt/asm-deployer/lib/asm/service_deployment.rb:5549:in `process_firmware_update'", "/opt/asm-deployer/lib/asm/service_deployment.rb:479:in `process'", "/opt/asm-deployer/lib/asm.rb:228:in `block in process_deployment'"] INFO [2024-12-19T13:37:23.216961] 19552: service_deployment.rb:625:in `process': Status: Error
Impact
Unable to upgrade SO nodes.
Cause
The logs indicate that PowerFlex Manager cannot proceed with the update_clc_node_agent task, as it fails to identify the correct 'Primary' among the two CLC appliances. Which is shown in the deployment.logs on the error line:
ERROR [2024-12-19T13:35:51.093230] 19552: service_deployment.rb:5535:in `process_firmware_update': Encountered an error during firmware update: NoMethodError: undefined method `[]' for nil:NilClass
Resolution
-
Attempt an Update Service Details Action
-
Initiate the Update Service Details action on the affected service.
-
-
Verify Inventory Summary in the Wizard
- During the process, the wizard should display an Inventory Summary indicating that one CLC appliance is being removed and another is being added.
- This confirms that the current CLC is not the Primary, and the appliance being added is the correct Primary
-
Complete the Update Service Details Process
- Finish the Update Service Details action as guided by the wizard.
-
Proceed with the Upgrade
- Retry the upgrade. It should now proceed without issues.
Impacted Version
PowerFlex Manager 3.x