PowerFlex 4.x How to replace NVDIMM using PFMP Wizard

Summary: This article explains how to use the PowerFlex Manager Platform (PFMP) wizard to replace a faulty NVDIMM in PowerEdge Server.

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Instructions

Steps

  • Identify the faulty NVDIMM module in idrac and correlate it's DAX device in PFMP

1- Identify the faulty NVDIMM slot from PowerEdge iDrac from the Maintenance tab, select System Event Log.

idrac faulty NVDIMM slot

In this example, the faulty NVDIMM slot is A7 

 

2- Identify the faulty NVDIMM serial number using SSH to the impacted SDS node and run the following command. 

dmidecode --type memory | grep "Non-" -B 3 -A 3 | grep -E 'Locator|Serial' | grep -v Bank

Output similar to the following appears:

Locator: A7
Serial Number: 16492521 
Locator: B7
Serial Number: 1649251B

In this example, the serial number for NVDIMM A7 is 16492521.

 

3- Display the list of NVDIMMS mounted on the server and find the dev nmem for the serial 16492521.

ndctl list -Dvvv | jq '.[].dimms'

Output similar to the following should appear:

  [
{
"dev": "nmem1",
"id": "802c-0f-1711-1649251b",
"handle": 4097,
"phys_id": 4370, "state": "disabled", "health": {
"health_state": "ok", "temperature_celsius": 255,
"life_used_percentage": 32
}
},
{
"dev": "nmem0",
"id": "802c-0f-1711-16492521",
"handle": 1,
"phys_id": 4358, "state": "disabled", "health": {
"health_state": "ok", "temperature_celsius": 255,
"life_used_percentage": 32
}
}
]

In this example, nmem0 is the dev for the serial 16492521.

 

4- Correlate nmem mapping, regionnamespace, and DAX configuration information using the following command 

ndctl list -Dvvv | jq '.[].regions[]'

Output similar to the following appears:

 
  {
"dev": "region1", "size": 17179869184,
"available_size": 0,
"max_available_extent": 0, "type": "pmem", "numa_node": 1, "mappings": [
{
"dimm": "nmem1", "offset": 0,
"length": 17179869184,
"position": 0
}
],
"persistence_domain": "unknown", "namespaces": [
{
"dev": "namespace1.0",
"mode": "devdax",
"map": "dev", "size": 16909336576,
"uuid": "0a438fbc-91e4-427d-8068-1f26330d85cc", "daxregion": {
"id": 1,
"size": 16909336576,
"align": 4096,
"devices": [
{
"chardev": "dax1.0", "size": 16909336576
}
]
},
"numa_node": 1
}
]
}
{
"dev": "region0",
"size": 17179869184,
"available_size": 0,
"max_available_extent": 0, "type": "pmem", "numa_node": 0, "mappings": [
{
"dimm": "nmem0", "offset": 0,
"length": 17179869184,
"position": 0
}
],
"persistence_domain": "unknown", "namespaces": [
{
"dev": "namespace0.0",
"mode": "devdax",
"map": "dev", "size": 16909336576,
"uuid": "38cbd555-3f5b-4f4f-8d83-bf77db75553d", "daxregion": {
"id": 0,
"size": 16909336576,
"align": 4096,
"devices": [
{
"chardev": "dax0.0",
"size": 16909336576
}
]
},
"numa_node": 0
}
]
}

In this example nmem0 is in region region 0, namespace namespace0.0, and DAX device dax0.0.

The result of these steps is to correlate A7 from iDrac with dax0.0 in PFMP. 

 

  • Remove the NVDIMM memory module

Remove the NVDIMM memory module from the system using the PowerFlex Manager NVDIMM replacement wizard.

  1. Log in to PowerFlex Manager.
  2. On the menu bar, click Lifecycle > Resource Groups.
  3. On the Resource Groups page, click the resource group that needs replacement and click View Details.
  4. On the Details page, scroll down to the Physical Nodes section of the page.
  5. Under Component Replacement, click NVDIMM Replacement.

  PowerFlex Manager displays the Node List panel in the NVDIMM Replacement wizard.

  1. Select the node that needs the NVDIMM memory module replaced and click Next.

  PowerFlex Manager displays the Selected Component panel. All available NVDIMM memory modules are displayed under the NVDIMM header, while the available NVDIMM batteries are displayed under NVDIMM Battery.

  1. Under NVDIMM Replacement, select the faulty NVDIMM memory module you want to replace and click Next.

   A message displays prompting you to ensure that the node selection is correct, as the NVDIMM replacement process is irreversible.

  1. To replace an NVDIMM memory module, enter REMOVE NVDIMM.

  A message stating the removal or addition of the NVDIMM device, with the node and slot numbers is displayed on the Resource Groups page. The status of the resource group and the individual node is In Progress. The log details are displayed in the Recent Activity section on the right side of the page.

  A job for the replacement of the memory module gets created.

  1. Click the Jobs icon at the upper right hand-side of the menu bar to view the details of the job. Wait for the job to finish.

 

  • Dell Field Engineer (FE) to do the physical replacement for the faulty NVDIMM 

Put the SDS node into PMM or IMM, shutdown the node, and let the Dell FE do the faulty NVDIMM replacement.

 

  • Completing the NVDIMM memory module replacement

After the memory module is replaced, the host and SVM are turned off. After the physical replacement of the memory module, the status of the host on the Resource Groups page displays service mode. Also under Actions, the option Discover Replacement NVDIMM displays.

Prerequisites

Ensure that you have performed the steps in Remove the NVDIMM memory module, then follow these steps to complete the NVDIMM memory module replacement.

Steps

  1. When the node is replaced physically, click Discover Replacement NVDIMM.

   The Discover Replacement NVDIMM turns on the node and does a system erase of the NVDIMMs.

  1. On completion of the discovery, the log displays the status as Complete. Under Actions, click the Complete NVDIMM Replacement option.
  1. Click Complete to finish the replacement process.

    After replacing the NVDIMM, you can create virtual hardware for the NVDIMM device, remove the SDS from maintenance or service mode, and turn on the SVM.

  1. After adding the new NVDIMM memory module, a message The NVDIMM device replacement is complete displays on the Resource Groups page. Under Actions, click Dismiss to dismiss the task.

 

  • Bring the resource into compliance and return the node to operation

After replacing the hardware component, update the system resources to bring the resources into compliance with the firmware and drivers in the compliance file. When the resource is compliant, return the node to operation.

Affected Products

PowerFlex rack, ScaleIO
Article Properties
Article Number: 000321223
Article Type: How To
Last Modified: 14 May 2025
Version:  1
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.