PowerFlex 4.x Upgrade Fails with Outofmemoryerror and Java Heap Space
Summary: PFxM upgrade fails with out-of-memory error and Java heap space error.
Symptoms
When performing PowerFlex upgrade, the upgrade fails with errors Out-OfMemoryError and java heap space.
From deployment.log:
DEBUG [2025-01-06T22:50:31.482915] 573476: provider/configuration/vxos_update.rb:1082:in `process!': scaleio-block-legacy-gateway: Upgrade LIA DEBUG [2025-01-06T22:50:31.483170] 573476: provider/configuration/vxos_update.rb:707:in `im_upgrade': scaleio-block-legacy-gateway: Initiating VxOS cluster upgrade DEBUG [2025-01-06T22:50:31.576972] 573476: provider/configuration/vxos_update.rb:687:in `im_upgrade_staging': scaleio-block-legacy-gateway: Uploading VxOS RPMs for current version DEBUG [2025-01-06T22:50:58.310637] 573476: provider/configuration/vxos_update.rb:690:in `im_upgrade_staging': scaleio-block-legacy-gateway: Uploading VxOS RPMs for newer versions DEBUG [2025-01-06T22:50:58.312193] 573476: provider/configuration/vxos_update.rb:404:in `block in vxos_rpms': scaleio-block-legacy-gateway: Downloading rpms from nginx to /tmp: https://http-share.powerflex.svc:443/download//8aaa80939422a51b01943b5873e82d40/os/VxFlex4.5.2SLES15.4Repo/vxflexos_4.5.2000/ DEBUG [2025-01-06T22:50:59.581675] 573476: provider/configuration/vxos_update.rb:418:in `block in vxos_rpms': scaleio-block-legacy-gateway: Download operation result: # DEBUG [2025-01-06T22:50:59.582232] 573476: provider/configuration/vxos_update.rb:424:in `block in vxos_rpms': scaleio-block-legacy-gateway: Local rpm path: /tmp/d20250106-5032-ukddq7/http-share.powerflex.svc/8aaa80939422a51b01943b5873e82d40/os/VxFlex4.5.2SLES15.4Repo/vxflexos_4.5.2000 DEBUG [2025-01-06T22:50:59.587624] 573476: provider/configuration/vxos_update.rb:404:in `block in vxos_rpms': scaleio-block-legacy-gateway: Downloading rpms from nginx to /tmp: https://http-share.powerflex.svc:443/download//8aaa80939422a51b01943b5873e82d40/os/VxFlex4.5.2RHEL7Repo/vxflexos_4.5.2000/ DEBUG [2025-01-06T22:51:00.312803] 573476: provider/configuration/vxos_update.rb:418:in `block in vxos_rpms': scaleio-block-legacy-gateway: Download operation result: # DEBUG [2025-01-06T22:51:00.313528] 573476: provider/configuration/vxos_update.rb:424:in `block in vxos_rpms': scaleio-block-legacy-gateway: Local rpm path: /tmp/d20250106-5032-ukddq7/http-share.powerflex.svc/8aaa80939422a51b01943b5873e82d40/os/VxFlex4.5.2RHEL7Repo/vxflexos_4.5.2000 ERROR [2025-01-06T22:56:16.377904] 573476: rule_engine/rule.rb:241:in `process_state': Encountered a critical unrecoverable error while processing #: Java::JavaLang::OutOfMemoryError: Java heap space ERROR [2025-01-06T22:56:16.378496] 573476: service/processor.rb:54:in `block in process_state_threaded': Encountered a critical unrecoverable error while processing the service: Java::JavaLang::OutOfMemoryError: Java heap space ERROR [2025-01-06T22:56:16.379412] 573400: rule_engine/rule.rb:241:in `process_state': Encountered a critical unrecoverable error while processing #: Java::JavaLang::OutOfMemoryError: Java heap space
Impact
Upgrade failure.
Cause
The out-of-memory issue occurs in thin-deployer while it orchestrates the upgrade process due to a large number of objects in PFxM (for example 64 nodes/SVMs environment).
The deployer process does not have a hard coded heap size memory setting therefore the JVM uses 1/4 of the node memory. The MVM VMs are deployed with 32 GB of memory by default. This equates to 8 GB max heap size for the deployer process.
Resolution
Both options can be used at the same time if needed.
Option 1:
Increase the MVM VM memory to 64 GB. This increases the amount of available Java heap memory used by the deployer process to 16 GB:
Engineering does not have specific metrics for the memory required per node, but using 64 GB has proven successful.
If the upgrade still fails, drop the replica set of the PowerFlex Gateway (GW) down to 1 GW:
1) SSH to a PFMP server
2) Drop the replicas down to 1:
kubectl scale sts block-legacy-gateway -n powerflex --replicas=1
3) Perform the GW upgrade. This upgrades the system to 4.x, where MTLS is used, and this problem will not be observed. 4) Once the backend PowerFlex system is upgraded to 4.x, adjust the replica set back to 2 for the GW:
kubectl scale sts block-legacy-gateway -n powerflex --replicas=2
Option 2:
Remove the Resource Groups (RGs) before upgrading PowerFlex, then use the Add Existing Resource Group operation to add them back. The memory usage described in this issue is only used for nodes that are in RGs; if there are no RGs, then the upgrade will not build the objects and there should not be an OOM.
Impacted Versions
PFMP 4.x
Fixed In Version
PFMP 4.8