PowerFlex: Procedure to update firmware to mitigate an issue of NVDIMM Batteries losing charge on PowerFlex custom (R650, R750) nodes
Summary: Procedure to update firmware to mitigate an issue of NVDIMM batteries losing charge on PowerFlex custom (R650 & R750) nodes..
Instructions
Issue Description
Dell PowerFlex 15G systems support a configuration that contains NVDIMMs which provide persistent memory required for the Fine Granularity feature. This procedure contains the steps to update the iDRAC and CPLD versions to the following:
- iDRAC 7.10.50.201
- CPLD 1.1.1
Download Location
- PowerFlex custom node: https://www.dell.com/support/home/en-us/product-support/product/powerflex-custom-node/drivers
- (search iDRAC & CPLD)
The new iDRAC version checks on the NVDIMM Battery cell voltages when it is enabled. It logs an error if the cell voltages are below a defined threshold. It logs an event with a severity of info, when the NVDIMM Battery transitions between READY and ENABLED states.
The new CPLD design tracks the NVDIMM Battery enable state and transitions back to Ready state after a fixed amount of time.
- NOTE: As a result of the issue above, there is a possibility that the NVDIMM Battery in the nodes could be damaged and must be replaced. This procedure accounts for this possibility.
- NOTE: This update of the iDRAC and CPLD must follow a specific sequence. The sequence requires iDRAC to be upgraded first, then the CPLD is updated after. This helps identify if the battery has failed.
Procedure Overview
To fully address the issue, we must do the following:
- Identify which NVDIMM Batteries in the system lack sufficient charge.
- Since this specific condition is not being reported, we need to update iDRAC to the version indicated above (or higher). This new version of iDRAC will report BAT0021 or BAT0017 errors if it detects a NVDIMM Battery that is discharged.
- Update the iDRAC to the version indicated on all nodes in the system
- Request replacement NVDIMM Batteries for every one that is reported as discharged and bad.
- Do not proceed with the CPLD update on nodes with bad NVDIMM Batteries until you have replaced the bad battery.
- Replacement batteries may not come fully charged and may require up to 75 minutes to achieve full charge after installation.
- For nodes with good NVDIMM Batteries, you can proceed with the CPLD FW update.
- After replacing the bad batteries, proceed to update the CPLD firmware using the iDRAC.
- The node must be placed in Maintenance mode prior to completing the update
- The node reboots and then performs the CPLD update.
- Once the node reboots after the CPLD update, take the node out of Maintenance mode.
- Once the node is out of Maintenance, a Rebuild and a Rebalance operation starts.
- NOTE: Wait for the Rebuild and Rebalance to complete before proceeding to update the next node in the cluster.
Prerequisites
- Minimum BIOS version for this CPLD update is BIOS version 1.8.2. (1.10.2 or higher is recommended)
- During the component replacement procedure, the customer is responsible for the following tasks:
- Migrating any non-PowerFlex applications on the node to another server.
- Shut down the server gracefully by following the appropriate shutdown procedure for the operating system in use.
- Ensure that the following information is available:
- The type of PowerFlex node: Physical node or HCI (Vmware) node
- The IP address range, subnet, and gateway IP address for the PowerFlex cluster and its nodes
- The IP address range, subnet, and gateway IP addresses for the iDRAC port on the node (defined during the initial deployment process)
- All root and administrator passwords that are set on the server and iDRAC
- vCenter IP address and login credentials if the configuration is HCI
Detailed Procedure
Step 1: Update iDRAC on all nodes in the PowerFlex cluster.
The iDRAC on these nodes can be updated without rebooting the nodes. The procedure to update iDRAC is as follows.
Ensure that the firmware image has been downloaded to a specific location on the local system. The firmware version for iDRAC is needed for this procedure, and the download location is listed on the first page of the procedure.
NOTE: Ensure that the iDRAC firmware is updated on all nodes in the cluster before proceeding to the next step in the procedure….
- Log in to the iDRAC9 web interface.
- Go to Maintenance, then click System Update. The Manual Update page is displayed.
- From the Manual Update tab, select Local as the Location Type.
Figure 1: iDRAC9 Update Screen
- Click Choose File, select the firmware image file for the required component, and then click Upload.
- After the upload is complete, the Update Details section displays each firmware file that is uploaded to iDRAC and its status. If the firmware image file is valid and was successfully uploaded, the Contents column displays a (+) icon next to the firmware image filename. Expand the name to view the Device Name, Current, and Available firmware version information.
- Select the required iDRAC firmware file.
- iDRAC firmware update does not require a host system reboot. Click Install initiate the update.
- To display the Job Queue page, click Job Queue. Use this page to view and manage your pending firmware updates. You can click OK to refresh the current page to view the status of the firmware update.
- The life cycle controller will restart and connection to the iDRAC will be reset. Please wait for a few minutes to login into the iDRAC.
Note: If connection failure is seen, see HTTP and HTTPS FQDN Connection Failures KB - https://www.dell.com/support/kbdoc/en-us/000193619
Step 2: Check if the node has a bad NVDIMM Battery.
iDRAC 7.10.50.201 and above contains code that checks the charge in the NVDIMM Battery cells every 5 seconds and reports the following error in the iDRAC System Event Log (SEL) if the cell voltage is below the threshold of 1.5v.
“BAT0021: The NVDIMM battery has reached the end of its usable life or has failed”“BAT0017: The NVDIMM battery has failed.”
If one of the above messages is reported, this means that the node has a bad NVDIMM Battery, which needs to be replaced.
- NOTE: Please do not proceed to update the CPLD on this node with a bad NVDIMM Battery. The reboot during a CPLD update will hang and not arm the NVDIMM because of the bad battery.
- NOTE: Please request a replacement NVDIMM Battery.
If the iDRAC does not report an issue, the battery on this node is good and does not need to be replaced. The CPLD update can be performed on this node, and you can proceed to the next step.
-
NOTE: PowerFlex Engineered Systems will report a node health warning for any node expressing the BAT0021 error. This behavior can be used to identify failed NVDIMM Batteries after the iDRAC update
-
NOTE: If the BAT0017 or BAT0021 event shows up, but then a 3rd event of BAT0016, then the battery does NOT need to be replaced!
NVDIMM engineering advised if this 3rd event (BAT0016) is encountered, the battery does not need to be replaced.BAT0016 The NVDIMM battery is operating normally.
If all three battery alerts (BAT0021, BAT0020, BAT0016) are all seen in sequence in less than a minute, this should be considered a false alarm and the battery is considered healthy and should not be replaced.
Step 3: Prepare node - Put the SDS in Maintenance Mode.
- Enter Maintenance Mode by following the appropriate procedure in section below under PowerFlex version 4.x or PowerFlex version 3.x
- Select the correct PowerFlex Maintenance mode
- If the node does not have a bad NVDIMM Battery, Instant Maintenance Mode (IMM) is recommended to update the CPLD.
- Skip steps 4 and 5 (replacing the battery) and proceed with the BIOS (step 6) & CPLD upgrade (step 7)
- If the node does have a bad NVDIMM Battery, it needs to be replaced prior to updating the CPLD.
- In this case, the node should be put in Protected Maintenance Mode (PMM) to account for the time the replacement battery takes to charge.
- If the node does not have a bad NVDIMM Battery, Instant Maintenance Mode (IMM) is recommended to update the CPLD.
- NOTE: This procedure causes a rebalance process to start, therefore it is recommended to plan this procedure for a scheduled maintenance window.
- NOTE: If using PMM before the NVDIMM Battery replacement (steps 4-5), you can stay in PMM Maintenance mode for the BIOS and CPLD update (steps 6-7)
PowerFlex version 4.x, See 'Dell Powerflex 4.x Administration' Guide.
- If the node is acting as the Primary MDM, follow these steps to switch MDM ownership:
- If an SDR is configured on the node, place the SDR into maintenance mode
- Place storage data server(SDS) in maintenance mode
- If this is an HCI (Vmware) node, place the ESXi in Maintenance mode after the above steps
PowerFlex version 3.x, See 'Upgrade Dell PowerFlex to v3.6.x' guide.
- Entering the node into maintenance mode and shutdown
Step 4: Replace NVDIMM Battery.
Refer to the Solve documentation for PowerFlex nodes available at the link provided below for instructions to replace the NVDIMM Battery.
Download the SolVe documentation for “NVDIMM battery” for the appropriate node type R650/R750 on 15G.PowerFlex Custom Node > Replacement > 15G > [R650 or R750] > [PowerFlex 3.6 or 4.0] > NVDIMM battery - Linux-based
-
NOTE: Please skip the following sections in the “Replacing the NVDIMM Battery” documentation.
Remove the storage devices from PowerFlex.
Reasons storage devices should not be removed:
- Removing storage devices during this procedure will cause an unnecessary rebuild of the entire node, extending the maintenance significantly.
- Since this is a planned reboot and not a power loss event, the NVDIMM subsystem does not rely on the power from batteries to complete the save operation. The power is sourced from the PSUs even when the batteries are bad.
- The reboot will cause the system to report an error with the NVDIMM Battery during powerup. But the data in the NVDIMM has been saved and loss of data is not experienced.
Step 5: Wait for the replaced batteries to charge
If you have replaced the NVDIMM Battery on a particular node, please power on the system. The system will not boot completely. The BIOS will stop waiting for the batteries to charge since a replacement NVDIMM Battery may not have sufficient charge needed by the system to protect the data in the NVDIMM. The batteries may take about 60-75 minutes to charge. You can try restarting the system after 60-75 minutes and the node should power up and arm the NVDIMM subsystem as well.
- NOTE: Plan your maintenance window based on 60-75 minutes for every node that needs to have a replacement battery.
Step 6: Update BIOS if necessary
The CPLD version used to fix the underlying issue in this KB requires a minimum BIOS of 1.8.2. (BIOS 1.10.2 or higher is recommended)
PowerFlex custom node deployments require specific versions of drivers, BIOS, and firmware that is validated and qualified by Dell.
If the current BIOS version is below 1.8.2, update the firmware to the latest version as published in the PowerFlex Custom Node Driver and Firmware Matrix.
- Ensure that the node is in Maintenance mode. If it is not, see 'Step 3' for instructions.
- For downloading the BIOS version, see 'Download Locations' section of this KB
- Note: Click “Older Versions” to choose a version to align with matrix that's being targetted.
- Proceed with upgrading the BIOS to version 1.8.2 or later.
- Dell PowerEdge BIOS upgrade procedure for 15G: https://www.dell.com/support/kbdoc/en-us/000222827/dell-technologies-recommends-upgrading-bios-and-idrac9-for-15th-generation-poweredge-servers
Step 7: Update CPLD to version 1.1.1
Prerequisite:
- Ensure that the SDS is in PowerFlex Maintenance Mode. For ESXi nodes, ensure the ESXi is also in Maintenance Mode. If not, see 'Step 3' of this article for instructions.
- Ensure that the CPLD firmware image is available on your local system. To download the appropriate CPLD version, see the 'Download Locations' section of this article.
- The CPLD update triggers a node reboot.
Note:
- Perform the CPLD firmware update after the iDRAC firmware update.
- Replace the NVDIMM battery before proceeding with the CPLD firmware update.
CPLD update procedure is as follows.
- Log in to the iDRAC9 web interface.
- Go to Maintenance, then click System Update. The Manual Update page is displayed.
- From the Manual Update tab, select Local as the Location Type.
Figure 1: iDRAC9 Update Screen - Click Choose File, select the firmware image file for the required component, and then click Upload.
- After the upload is complete, the Update Details section displays each firmware file that is uploaded to iDRAC and its status. If the firmware image file is valid and was successfully uploaded, the Contents column displays a (+) icon next to the firmware image filename. Expand the name to view the Device Name, Current, and Available firmware version information.
- Select the required CPLD firmware file.
- A CPLD firmware update will require a host system reboot. Click Install to initiate the update.
- To display the Job Queue page, click Job Queue. Use this page to view and manage your pending firmware updates. You can click OK to refresh the current page to view the status of the firmware update.
Step 8: Take node out of Maintenance mode.
PowerFlex version 4.x, see 'Dell Powerflex 4.x Administration Guide'.
Prerequisites: Ensure that you have the IP address and admin login credentials for accessing PowerFlex Manager. If necessary, the customer can provide you with the necessary information.
-
Power on the node, if you haven't already done so after the CPLD update. The operating system will boot up and all PowerFlex processes will start up automatically.
-
After the node is up, from your browser, log back in to PowerFlex Manager as an admin user.
-
On the menu bar, click Monitoring > Alerts and confirm that no disconnect message appears for an SDS or an SDC host, or for an SDR or SDT, if applicable.
- For an ESXi node, perform the following:
- From the vSphere Web Client, ensure that the node is displayed as on and connected in both Hosts and Clusters view.
- Right-click the node and select Exit Maintenance Mode.
- Expand the server and select the Storage VM (SVM). If the SVM does not power on automatically, power it on manually.
- Exit the SDS from maintenance mode:
- If an SDR is configured on the node, remove the SDR from maintenance mode.
PowerFlex version 3.x, see 'Upgrade Dell PowerFlex to v3.6.x' guide.
Prerequisites:Ensure that the user have the following credentials (available from the administrator): PowerFlex presentation server IP address or hostname, used for accessing the PowerFlex GUI
- Power on the node if it has not already been done following the CPLD update. The operating system boots up and all PowerFlex processes start up automatically
- Exit the node from maintenance mode: Return the node to operation
- If an SDR is configured on the node, remove the SDR from maintenance mode.
Step 9: Continue with the next node in the cluster
The sequence of steps 2 through step 8 should be completed for all the nodes in the cluster, one at a time. Once all the nodes have been upgraded the process is complete