PowerFlex: OS conversion failure Due to missing MDM packages: Could not remove mdm
Sammanfattning: In PFMP 4.8, when using the UI to perform a "Replace Node OS" operation, it fails due to missing MDM packages.
Symptom
- Performing an OS conversion via the "Replace Node OS" wizard fails on SDS nodes
- PowerFlex cluster consisting of 10 or more nodes
- Fault Sets are not configured on the PowerFlex cluster
The OS conversion fails as it tries to remove the MDM package from an SDS node that is not part of the MDM cluster and has no MDM package installed:
Recent Activity window
Node OS migration from CentOS to SLES started.
Beginning attempt 1 for os migration job Job-b879317a-4c68-4a9e-a4b3-58f32e42970d-2c9c31aa92f5d3470192f945eafc50dc-0 for node with service tag #######
Starting OS replacement for node SERVERNAME.
Uninstalling MDM role for node: SERVERNAME.
Uninstalling MDM role for node: SERVERNAME.
Uninstalling MDM role for node: SERVERNAME.
Uninstalling MDM role for node: SERVERNAME.
Uninstalling MDM role for node: SERVERNAME.
Uninstalling MDM role for node: SERVERNAME failed.
Update attempt 1 for NodeOSMigrationJob Job-b879317a-4c68-4a9e-a4b3-58f32e42970d-2c9c31aa92f5d3470192f945eafc50dc-0 for node with service tag ####### has failed!
Failed to migrate the nodes' OS from CentOS to SLES.
Thin-Deployer deployment.log
/opt/asm-deployer/lib/asm/service/processor.rb:52:in `block in process_state_threaded'
DEBUG [2026-01-06T07:53:46.858275] 15132784: provider/configuration/centos_sles_migration.rb:319:in `uninstall_mdm_role': rackserver-#######: Wait for 2 minutes before the cluster becomes stable for new MDM role scale-down
...
INFO [2026-01-06T07:55:46.879963] 15132784: provider/elementmanager/scaleio.rb:6292:in `current_mdm_role': scaleio-block-legacy-gateway: Found scaleio data ips: ["172.1.1.12", "172.1.2.12"]
INFO [2026-01-06T07:55:46.880200] 15132784: provider/elementmanager/scaleio.rb:6980:in `search_cluster_for_role': scaleio-block-legacy-gateway: Search cluster for role with ips: ["172.1.1.12", "172.1.2.12"]
DEBUG [2026-01-06T07:55:46.984419] 15132784: provider/configuration/centos_sles_migration.rb:265:in `uninstall_mdm_role': rackserver-#######: Existing MDM cluster role: FiveNodes
DEBUG [2026-01-06T07:55:47.005270] 15132784: provider/configuration/centos_sles_migration.rb:274:in `uninstall_mdm_role': rackserver-#######: Fault set configuration for deployment []
DEBUG [2026-01-06T07:55:47.005918] 15132784: provider/configuration/centos_sles_migration.rb:276:in `uninstall_mdm_role': rackserver-#######: Uninstalling MDM role for node: SERVERNAME
DEBUG [2026-01-06T07:55:47.058169] 15132784: provider/configuration/centos_sles_migration.rb:281:in `uninstall_mdm_role': rackserver-#######: Updating OS Image type to CentOS
...
INFO [2026-01-06T07:55:47.079938] 15132784: provider/elementmanager/scaleio.rb:7422:in `scaledown_mdm_cluster!': scaleio-block-legacy-gateway: Found scaleio data ips: ["172.1.1.12", "172.1.2.12"]
INFO [2026-01-06T07:55:47.080157] 15132784: provider/elementmanager/scaleio.rb:6980:in `search_cluster_for_role': scaleio-block-legacy-gateway: Search cluster for role with ips: ["172.1.1.12", "172.1.2.12"]
DEBUG [2026-01-06T07:55:47.080629] 15132784: private_util.rb:6628:in `wait_until_scaledown_is_completed_in_fs': scaleio-block-legacy-gateway: Entering wait_until_scaledown_is_completed_in_fs() for hostname: SERVERNAME
DEBUG [2026-01-06T07:55:47.116988] 15132784: private_util.rb:6686:in `wait_until_scaledown_is_completed_in_fs': scaleio-block-legacy-gateway: It's not an OS Replacement with FS involved. Skipping mutex operations !!
INFO [2026-01-06T07:55:47.117428] 15132784: provider/elementmanager/scaleio.rb:7128:in `search_cluster_for_id': scaleio-block-legacy-gateway: Search cluster for id with ips: ["172.1.1.12", "172.1.2.12"]
WARN [2026-01-06T07:55:47.117916] 15132784: provider/elementmanager/scaleio.rb:7471:in `scaledown_mdm_cluster!': scaleio-block-legacy-gateway: Could not remove mdm vm powerflex: SERVERNAME RuntimeError: Could not find vm powerflex id for: SERVERNAME
DEBUG [2026-01-06T07:55:47.118261] 15132784: provider/configuration/centos_sles_migration.rb:318:in `uninstall_mdm_role': rackserver-#######: Failure observed while uninstalling MDM Role for SERVERNAME:Could not find vm powerflex id for: dc-pf51-sn-62:RuntimeError:/opt/asm-deployer/lib/asm/provider/elementmanager/scaleio.rb:7431:in `block in scaledown_mdm_cluster!'
/opt/asm-deployer/lib/asm/private_util.rb:6689:in `wait_until_scaledown_is_completed_in_fs'
/opt/asm-deployer/lib/asm/provider/elementmanager/scaleio.rb:7429:in `scaledown_mdm_cluster!'
/opt/asm-deployer/lib/asm/provider/configuration/centos_sles_migration.rb:304:in `uninstall_mdm_role'
/opt/asm-deployer/lib/asm/provider/configuration/centos_sles_migration.rb:415:in `process_storage_only_workflow'
/opt/asm-deployer/lib/asm/provider/configuration/centos_sles_migration.rb:990:in `process!'
/opt/jruby/jruby-9.4.8.0/lib/ruby/stdlib/forwardable.rb:238:in `process!'
Impact
OS conversion process fails.
Orsak
In PFMP 4.8, a defect in the OS operation logic causes the process to fail when the system attempts to uninstall an MDM package from a node.
This issue occurs on clusters with the following characteristics:
- The storage cluster contains more than 9 nodes.
- The storage cluster contains nodes that do not have an MDM package installed.
- The storage cluster is not using Fault Sets (FSs).
Upplösning
Each SDS node that doesn't have an MDM package installed on it will have to have it installed as a TB and added to the MDM cluster as a standby MDM.
1) On the Primary MDM, copy the MDM rpm package to the destination SDS:
scp /opt/emc/scaleio/lia/rpm/mdm/EMC-ScaleIO-mdm-4.5-3000.128.<relevant OS>.x86_64.rpm root@<destination SDS IP>:/tmp/
2) On the destination SDS, install the MDM package:
MDM_ROLE_IS_MANAGER=0 rpm -ivh /tmp/EMC-ScaleIO-mdm-4.5-3000.128.<relevant OS>.x86_64.rpm
3) On the Primary MDM, add the new TB MDM as a standby MDM:
scli --add_standby_mdm --mdm_role tb --new_mdm_ip <IP>
4) Re-run the "Replace Node OS" wizard.
- Locate the Deployment ID (Job-xxxxxx-xxxx-xxxx-xxxx-xxxxxx-x-x) of the failed OS replacement. You can find the Job (mentioned above) in Recent Activity.
- Use the Deployment ID (Job ID) to resume the upgrade process from where it left off. Run the following command from any MVM node:
- Connect to the thin-deployer pod:
kubectl exec -it -n powerflex $(kubectl get pods -n powerflex -o name | grep thin-deployer) -c thin-deployer -- bash -
Replace <job id> with the Deployment ID. This needs to be updated in two places:
curl -X PUT -H "Content-Type: application/json" -d @/opt/Dell/ASM/deployments/<job id>/deployment.json https://localhost:9433/asm/deployment/<job id> -k # Example curl -X PUT -H "Content-Type: application/json" -d @/opt/Dell/ASM/deployments/Job-f764345d-1af8-40ae-b169-fdd2048a24bc-0-0/deployment.json https://localhost:9433/asm/deployment/Job-f764345d-1af8-40ae-b169-fdd2048a24bc-0-0 -k - Verify the status of the Deployment. Replace <job id> with the Deployment ID. This needs to be updated in a single place:
curl -X GET -H "Content-Type: application/json" https://localhost:9433/asm/deployment/<job id>/status -k # Example curl -X PUT -H "Content-Type: application/json" https://localhost:9433/asm/deployment/Job-f764345d-1af8-40ae-b169-fdd2048a24bc-0-0/status -k
- Connect to the thin-deployer pod:
5) Once the OS conversion ends successfully, on the Primary MDM, remove the destination SDS as a standby MDM:
scli --remove_standby_mdm --remove_mdm_ip <IP>
6) Finally, on the destination SDS, remove the MDM package:
rpm -e $(rpm -qa | grep "EMC-ScaleIO-mdm")
Impacted Versions
PFMP 4.8
Fixed In Version
PFMP 4.8.0.1