Data Domain: OS Upgrade Guide for Highly Available (HA) systems

Summary: This article provides a process overview for Data Domain Operation System (DDOS) upgrades on Data Domain Highly Available (HA) pairs.

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Instructions

Table of Contents:

  1. HA system planned maintenance
  2. Before starting
  3. Rolling upgrade using the GUI
  4. Rolling upgrade using the command line
  5. Local upgrade
 

HA system planned maintenance

In order to reduce planned maintenance downtime, the HA architecture includes a system rolling upgrade function. This upgrades the standby node first and then uses an HA failover to move the services from the active node to the standby node. Finally, the previous active node is upgraded and rejoins the HA cluster as the standby node. All these processes are done in one command.
 
An alternative manual upgrade approach is the "local upgrade." Manually upgrade the standby node first, and then manually upgrade the active node. Then the standby node rejoins the HA cluster. Local upgrades can be performed either for regular upgrades or fixing issues.
 
Some system upgrade operations require data conversion or index rebuilds on the active node. Do not start these operations until both nodes are upgraded to the same level and HA state is fully restored.
 

Data Domain File System (DDFS) availability during a rolling upgrade:

  1. The standby node is upgraded first, then reboots to the new version. This takes roughly 20-30 mins depending on various factors. The DDFS is up and operates on the active node during this period without any performance degradation
  2. After the new DDOS is applied, the system fails over to the upgraded standby node, making it the new active node. This takes roughly 10 mins (again, depending on various factors).
    • One significant factor is disk enclosure (DAE) firmware upgrades. This can add around 20 mins more downtime depending on how many DAEs are configured. Refer to this article to determine if a DAE firmware upgrade is required. Starting with DDOS 7.5 there is an enhancement to enable online DAE firmware upgrades, eliminating this concern.
    • Engage your account team to discuss factors that could impact upgrade times. Depending on the client OS, applications, and protocols used, client workloads may need to be manually resumed right after failover. For example, if a system is backing up DD Boost clients and the failover takes more than 10 mins, the clients time out and the user must manually resume workloads. However, there are usually tunable settings available on clients to set timeout values and retry times.
  3. The DDFS is down during the failover period. Monitor the output of the "filesys status" command on the upgraded node to confirm if the DDFS service is resumed or not.
  4. After the failover, the previously active node starts upgrading. After the upgrade is complete, the node reboots to the new version and then rejoins the HA cluster as the standby node. The DDFS service is not impacted during this process as it was already resumed above.

Before starting:

  • Review this article: DDHA Upgrade Pre-Check
  • Stop all cleaning, data movement, replication, and backup activity.
  • Perform a manual failover twice (optional). The expected file system downtime for each failover is around 10 minutes.
  • Confirm that HA status is "highly available." Rolling upgrades cannot be run if HA is degraded.
  • Upload the upgrade RPM file:
    • Using SCP
    • Using CIFS
    • For a rolling upgrade, upload the RPM to the active node only.
    • For a local upgrade, upload the RPM to both nodes.

Rolling upgrade using the GUI:

Prepare the system for upgrade:
  1. Start the precheck on the active node. The upgrade should be stopped if it encounters any errors.
    1. Go to Maintenance > System.
    2. Check the box for the required RPM file.
    3. Click the "UPGRADE PRECHECK" button.
      System page
  2. If the precheck finishes without any issues, proceed with the rolling upgrade on the active node.
    1. Go to Maintenance > System.
    2. Check the box for the required RPM file.
    3. Click the "PERFORM SYSTEM UPGRADE" button.
      System page
  3. Wait for the rolling upgrade to finish. Do not start any HA failover operations before it finishes.
  4. After the rolling upgrade finishes, go to the IP address of the former standby node (Node1 in these examples) in a web browser and log in.
  5. Go to Health > Alerts and check for any unexpected alerts.
  6. At this point, the rolling upgrade has finished successfully.

Rolling upgrade using the command line:

  1. Run the precheck on the active node. The upgrade should be stopped if it encounters any error.
    Active-node # system upgrade precheck <rpm file>
    Upgrade precheck in progress:
    Node 0: phase 1/1 (Precheck 100%) , Node 1: phase 1/1 (Precheck 100%)
    Upgrade precheck found no issues.
  2. If the precheck finishes without any issues, start the rolling upgrade on the active node.
    Active-node # system upgrade start <rpm file>
    
    The 'system upgrade' command upgrades the Data Domain OS.  File access
    is interrupted during the upgrade.  The system reboots automatically
    after the upgrade.
            Are you sure? (yes|no) [no]: yes
    ok, proceeding.
    Upgrade in progress:
    Node   Severity   Issue                           Solution
    ----   --------   ------------------------------  --------
    0      WARNING    1 component precheck
                      script(s) failed to complete
    0      INFO       Upgrade time est: 60 mins
    1      WARNING    1 component precheck
                      script(s) failed to complete
    1      INFO       Upgrade time est: 80 mins
    ----   --------   ------------------------------  --------
    Node 0: phase 2/4 (Install    0%) , Node 1: phase 1/4 (Precheck 100%)
    Upgrade phase status legend:
    DU : Data Upgrade
    FO : Failover
    ..               
    PC : Peer Confirmation
    VA : Volume Assembly
    
    Node 0: phase 3/4 (Reboot     0%) , Node 1: phase 4/4 (Finalize   5%) FO
    Upgrade has started.  System will reboot.
  3. After the standby node (node 1) reboots and becomes accessible, log back in to it to monitor the upgrade progress.
    Node1 # system upgrade status
    Current Upgrade Status: DD OS upgrade In Progress
    Node 0: phase 3/4 (Reboot     0%)
    Node 1: phase 4/4 (Finalize 100%) waiting for peer confirmation
  4. Wait for the rolling upgrade to finish. Until it does, do not start any HA failover operations.
    Node1 # system upgrade status
    Current Upgrade Status: DD OS upgrade Succeeded
    End time: 20xx.xx.xx:xx:xx
  5. Check the HA status. Confirm that both nodes are online and HA status is "highly available."
    Node1 # ha status detailed
    HA System name:               HA-system
    HA System Status:             highly available
    Interconnect Status:          ok
    Primary Heartbeat Status:      ok
    External LAN Heartbeat Status: ok
    Hardware compatibility check: ok
    Software Version Check:       ok
    Node  Node1:
          Role:          active
          HA State:      online
          Node Health: ok
    Node Node0:
          Role:          standby
          HA State:      online
          Node Health: ok
    Mirroring Status:
    Component Name   Status
    --------------   ------
    nvram            ok
    registry         ok
    sms              ok
    ddboost          ok
    cifs             ok
    --------------   ------
  6. Confirm that both nodes have the same DDOS version.
    Node1 # system show version
    Data Domain OS x.x.x.x-12345
    
    Node0 # system show version                  
    Data Domain OS x.x.x.x-12345
  7. Check for any unexpected alerts.
    Node1 # alert show current
    
    Node0 # alert show current
  8. At this point, the rolling upgrade has finished successfully.
 

Local upgrade:

  1. Check the HA status. A local upgrade can still work even if the status is "degraded."
    # ha status
    HA System name:       HA-system
    HA System status:     highly available   <-
    Node Name   Node id   Role      HA State
    ---------   -------   -------   --------
    Node0       0         active    online
    Node1       1         standby   online
    ---------   -------   -------   --------
  2. If the HA status is "highly available," run the precheck on the active node. If it is degraded, run the precheck on both nodes.
    Active-node # system upgrade precheck <rpm file>
    Upgrade precheck in progress:
    Node 0: phase 1/1 (Precheck 100%) , Node 1: phase 1/1 (Precheck 100%)
    Upgrade precheck found no issues.
    Active-node # system upgrade precheck <rpm file> local
    Upgrade precheck in progress:
    Node 0: phase 1/1 (Precheck 100%)
    Upgrade precheck found no issues.
    
    Standby-node # system upgrade precheck <rpm file> local
    Upgrade precheck in progress:
    Node 1: phase 1/1 (Precheck 100%)
    Upgrade precheck found no issues.
  3. Take the standby node offline.
    Standby-node # ha offline
    This operation will cause the ha system to no longer be highly available.
    Do you want to proceed? (yes|no) [no]: yes
    Standby node is now offline.
    • If this command fails or HA status is already degraded, continue with the local upgrade. Later steps may address these failures.
  4. Confirm that the standby node's status is offline.
    Standby-node # ha status
    HA System name:       HA-system
    HA System status:     degraded
    Node Name   Node id   Role      HA State
    ---------   -------   -------   --------
    Node0       0         active    degraded
    Node1       1         standby   offline
    ---------   -------   -------   --------
  5. Start the upgrade on the standby node. The node reboots during this process.
    Standby-node # system upgrade start <rpm file> local
    The 'system upgrade' command upgrades the Data Domain OS.  File access
    is interrupted during the upgrade.  The system reboots automatically
    after the upgrade.
            Are you sure? (yes|no) [no]: yes
    ok, proceeding.
    The 'local' flag is highly disruptive to HA systems and should be used only as a repair operation.
           Are you sure? (yes|no) [no]: yes
    ok, proceeding.
    Upgrade in progress:
    Node 1: phase 3/4 (Reboot     0%)
    Upgrade has started.  System will reboot.
  6. The standby node reboots into the new version of DDOS, but stays offline.
  7. Check the system upgrade status. It can take 30 minutes or more to complete the upgrade.
    Standby-node # system upgrade status
    Current Upgrade Status: DD OS upgrade Succeeded
    End time: 20xx.xx.xx:xx:xx
  8. Check HA status. Confirm that the standby node (Node1 in the below example) is offline, and HA status is "degraded."
    Standby-node # ha status
    HA System name:       HA-system
    HA System status:     degraded
    Node Name   Node id   Role      HA State
    ---------   -------   -------   --------
    Node0       0         active    degraded
    Node1       1         standby   offline
    ---------   -------   -------   --------
  9. Start the local upgrade on the active node. The node reboots during this process.
    Active-node # system upgrade start <rpm file> local
    The 'system upgrade' command upgrades the Data Domain OS.  File access
    is interrupted during the upgrade.  The system reboots automatically
    after the upgrade.
               Are you sure? (yes|no) [no]: yes
    ok, proceeding.
    The 'local' flag is highly disruptive to HA systems and should be used only as a repair operation.
               Are you sure? (yes|no) [no]: yes
    ok, proceeding.
    Upgrade in progress:
    Node   Severity   Issue                           Solution
    ----   --------   ------------------------------  --------
    0      WARNING    1 component precheck
             script(s) failed to complete
    0      INFO       Upgrade time est: 60 mins
    ----   --------   ------------------------------  --------
    Node 0: phase 3/4 (Reboot     0%)
    Upgrade has started.  System will reboot.
  10. Check the system upgrade status. It can take 30 minutes or more to complete the upgrade.
    Active-node # system upgrade status
    Current Upgrade Status: DD OS upgrade Succeeded
    End time: 20xx.xx.xx:xx:xx
  11. After the active node upgrade finishes, the HA system status still shows "degraded." Bring the standby node back online - this reboots the standby node. Skip this step if "ha offline" was not run previously.
    Standby-node # ha online
    The operation will reboot this node.
        Do you want to proceed? (yes|no) [no]: yes
    Broadcast message from root (Wed Oct 14 22:38:53 2020):
    The system is going down for reboot NOW!
    **** Error communicating with management service.
  12. The standby node reboots and rejoins the cluster. Afterwards, HA status should become "highly available" again.
    Active-node # ha status detailed
    HA System name:               Ha-system
    HA System Status:             highly available
    Interconnect Status:          ok
    Primary Heartbeat Status:      ok
    External LAN Heartbeat Status: ok
    Hardware compatibility check: ok
    Software Version Check:       ok
    Node node0:
              Role:          active
              HA State:      online
              Node Health: ok
    Node node1:
              Role:          standby
              HA State:      online
              Node Health: ok
    Mirroring Status:
    Component Name   Status
    --------------   ------
    nvram            ok
    registry         ok
    sms              ok
    ddboost          ok
    cifs             ok
    --------------   ------
  13. Confirm that both nodes have the same DDOS version.
    Node1 # system show version
    Data Domain OS x.x.x.x-12345
    
    Node0 # system show version                  
    Data Domain OS x.x.x.x-12345
  14. Check for any unexpected alerts.
    Node1 # alert show current
    
    Node0 # alert show current
  15. At this point, the upgrade has finished successfully.
 
Note: If there are any issues with the upgrade, contact your contracted support provider for further instructions and assistance.

Additional Information

Rolling upgrade:
  • A single failover is performed during the upgrade, so the nodes swap roles.
  • Upgrade information continues to be held in /ddr/var/log/debug/platform/infra.log, but there may be additional information in /ddr/var/log/debug/ha.log.
  • Upgrade progress can be monitored with the command 'system upgrade watch'.
Local node upgrade:
  • A local node upgrade does not perform an HA failover.
  • As a result, there is an extended period of downtime while the active node upgrades, reboots, and performs post-reboot upgrade activities. This is likely to cause backups and restores to time out and fail. A maintenance window must be allocated for local upgrades.
  • Local upgrades can proceed even if HA status is "degraded."
  • If a rolling upgrade fails unexpectedly, a local upgrade can be used instead.

Affected Products

Data Domain, DD OS
Article Properties
Article Number: 000009653
Article Type: How To
Last Modified: 30 Mar 2026
Version:  11
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.