ECS: xDoctor RAP145: rackServiceMgr is using memory above configured threshold

Summary: xDoctor has detected that the rackServiceMgr service is using more than 1 GB of memory.

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

-----------------------------------------------------------------
ERROR - rackServiceMgr is using memory above configured threshold
-----------------------------------------------------------------
Node      = Nodes
Extra     = {'Nodes': {'169.254.1.1': {'pid': '72996', 'memory': '15145728'}}}
RAP       = RAP145
Solution  = KB 203562
Timestamp = 2022-09-22_163710
PSNT      = CKMXXXXXXXXXXX @ 4.8-86.0

Cause

This is a known issue resolved in ECS code release 3.7.0.5 and 3.8.0.1.

Resolution

IMPORTANT! A new feature has been released in xDoctor 4-8.104 and above. This knowledge base (KB) is now automated with xDoctor, addressing most issues listed below without the need for L2 or coach involvement. If the script is unable to resolve the issue it provides a detailed summary of its findings. For more information, follow ECS: ObjectScale: How to run KB Automation Scripts (Auto Pilot)
 

Automated Solution:
To find the master node of the rack: 

Command:

#ssh master.rack
To find the NAN IP, you can use the ip identified in the alert or from the getrackinfo command:
admin@ecsnode1:~> getrackinfo
Node private      Node              Public                                BMC
Ip Address        Id       Status   Mac                 Ip Address        Mac                 Ip Address        Private.4(NAN)    Node Name
===============   ======   ======   =================   ===============   =================   ===============   ===============   =========
192.168.219.1     1        MA       00:00:00:00:00      0.0.0.0           00:00:00:00:00      192.168.219.101   169.254.1.1       provo-red
192.168.219.2     2        SA       00:00:00:00:00      0.0.0.0           00:00:00:00:00      192.168.219.102   169.254.1.2       sandy-red
192.168.219.3     3        SA       00:00:00:00:00      0.0.0.0           00:00:00:00:00      192.168.219.103   169.254.1.3       orem-red
192.168.219.4     4        SA       00:00:00:00:00      0.0.0.0           00:00:00:00:00      192.168.219.104   169.254.1.4       ogden-red
192.168.219.5     5        SA       00:00:00:00:00      0.0.0.0           00:00:00:00:00      192.168.219.105   169.254.1.5       layton-red
192.168.219.6     6        SA       00:00:00:00:00      0.0.0.0           00:00:00:00:00      192.168.219.106   169.254.1.6       logan-red
192.168.219.7     7        SA       00:00:00:00:00      0.0.0.0           00:00:00:00:00      192.168.219.107   169.254.1.7       lehi-red
192.168.219.8     8        SA       00:00:00:00:00      0.0.0.0           00:00:00:00:00      192.168.219.108   169.254.1.8       murray-red
 
  1.  Perform the automation command from the master node with xDoctor 4-8.104.0 and above.
Command:
Note:
  --target-rack is supported for this action.
sudo xdoctor autopilot --kb 203562 --target-rack <rack colour>
Example:
admin@ecsnode1:~> sudo xdoctor autopilot --kb 203562 --target-rack red
Checking for existing screen sessions...
Starting screen session 'autopilot_kb_203562_20250626_130631'...
Screen session 'autopilot_kb_203562_20250626_130631' started successfully.
Attaching to screen session 'autopilot_kb_203562_20250626_130631'...
  1. Review summary for rackServiceMgr detection and restart:
TASK [Debug node IP and status] ****************************************************************************************************************************************
ok: [169.254.1.6 -> localhost] => {
    "msg": "Host 169.254.1.6 has IP 169.254.1.6 and status unknown"
}
ok: [169.254.1.1 -> localhost] => {
    "msg": "Host 169.254.1.1 has IP 169.254.1.1 and status FIXED"
}
ok: [169.254.1.2 -> localhost] => {
    "msg": "Host 169.254.1.2 has IP 169.254.1.2 and status unknown"
}
ok: [169.254.1.3 -> localhost] => {
    "msg": "Host 169.254.1.3 has IP 169.254.1.3 and status unknown"
}
ok: [169.254.1.4 -> localhost] => {
    "msg": "Host 169.254.1.4 has IP 169.254.1.4 and status unknown"
}
ok: [169.254.1.5 -> localhost] => {
    "msg": "Host 169.254.1.5 has IP 169.254.1.5 and status unknown"
}
ok: [169.254.1.7 -> localhost] => {
    "msg": "Host 169.254.1.7 has IP 169.254.1.7 and status unknown"
}
ok: [169.254.1.8 -> localhost] => {
    "msg": "Host 169.254.1.8 has IP 169.254.1.8 and status unknown"
}

TASK [Aggregate node statuses] *****************************************************************************************************************************************
ok: [169.254.1.6 -> localhost] => {"ansible_facts": {"failed_nodes": "                  []", "fixed_nodes": "                        ['169.254.1.1']", "inactive_nodes": "                                                            ['169.254.1.6', '169.254.1.2', '169.254.1.3', '169.254.1.4', '169.254.1.5', '169.254.1.7', '169.254.1.8']", "pass_nodes": "                  []"}, "changed": false}

TASK [emc-rackservicemgr service status] *******************************************************************************************************************************
ok: [169.254.1.6 -> localhost] => {"ansible_facts": {"final_summary": ["********************************************************************************", "Summary of Rack Service Manager (emc-rackservicemgr) actions:", "IMPORTANT: Only one node per rack is expected to run the service. Inactive status on other nodes is normal.", "********************************************************************************", "FIXED: Restart was successful on 169.254.1.1 (Memory usage: Before=30788 MB, After=26072 MB)", "PASS: ecs-rackservicemgr is inactive on the following nodes: 169.254.1.6 169.254.1.2 169.254.1.3 169.254.1.4 169.254.1.5 169.254.1.7 169.254.1.8", "********************************************************************************"]}, "changed": false}

TASK [Summary of emc-rackservicemgr service status] ********************************************************************************************************************
ok: [169.254.1.6 -> localhost] => {
    "msg": [
        "********************************************************************************",
        "Summary of Rack Service Manager (emc-rackservicemgr) actions:",
        "IMPORTANT: Only one node per rack is expected to run the service. Inactive status on other nodes is normal.",
        "********************************************************************************",
        "FIXED: Restart was successful on 169.254.1.1 (Memory usage: Before=30788 MB, After=26072 MB)",
        "PASS: ecs-rackservicemgr is inactive on the following nodes: 169.254.1.6 169.254.1.2 169.254.1.3 169.254.1.4 169.254.1.5 169.254.1.7 169.254.1.8",
        "********************************************************************************"
    ]
}

TASK [Set fact for context] ********************************************************************************************************************************************
ok: [169.254.1.6 -> localhost] => {"ansible_facts": {"context": "  FIXED: Restart was successful on 169.254.1.1 (Memory usage: Before=30788 MB, After=26072 MB)"}, "changed": false}

TASK [Fail if any service restart failed] ******************************************************************************************************************************
skipping: [169.254.1.6] => {"changed": false, "false_condition": "final_summary | select('search', 'FAILED') | list | length > 0", "skip_reason": "Conditional result was False"}

PLAY RECAP *************************************************************************************************************************************************************
169.254.1.1                : ok=8    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
169.254.1.2                : ok=3    changed=0    unreachable=0    failed=0    skipped=5    rescued=0    ignored=1
169.254.1.3                : ok=3    changed=0    unreachable=0    failed=0    skipped=5    rescued=0    ignored=1
169.254.1.4                : ok=3    changed=0    unreachable=0    failed=0    skipped=5    rescued=0    ignored=1
169.254.1.5                : ok=3    changed=0    unreachable=0    failed=0    skipped=5    rescued=0    ignored=1
169.254.1.6                : ok=12   changed=0    unreachable=0    failed=0    skipped=24   rescued=0    ignored=1
169.254.1.7                : ok=3    changed=0    unreachable=0    failed=0    skipped=5    rescued=0    ignored=1
169.254.1.8                : ok=3    changed=0    unreachable=0    failed=0    skipped=5    rescued=0    ignored=1

========================================================================================================================================================================
Status: PASS
Time Elapsed: 0h 0m 8s
Debug log: /tmp/autopilot/log/autopilot_203562_Not provided.log
Message:   FIXED: Restart was successful on 169.254.1.1 (Memory usage: Before=30788 MB, After=26072 MB)
========================================================================================================================================================================

Affected Products

ECS, ECS Appliance, Elastic Cloud Storage
Article Properties
Article Number: 000203562
Article Type: Solution
Last Modified: 22 Jul 2025
Version:  7
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.