PowerFlex:由于内存分配失败,MDM 切换失败 — mos_MemMalloc

Summary: 切换 MDM 所有权(手动或其他方式)时,由于内存分配失败,接收 MDM 无法正常启动,使群集没有主 MDM。

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

来自接收 MDM /opt/emc/scaleio/mdm/bin/showevents.py 的事件输出将包含多个条目,用于尝试接管主 MDM 职责,所有这些条目都在短时间内相差无几:  

 2017-10-04 12:08:33.915 MDM_CLUSTER_BECOMING_MASTER WARNING This MDM, ID 394760fd6714xxxx, took control of 
the cluster and is now the Master MDM. 2017-10-04 12:08:33.915 MDM_BECOMING_MASTER WARNING This MDM is
 switching to Master mode. MDM will start running. .. 2017-10-04 12:08:34.309 MDM_CLUSTER_BECOMING_MASTER 
WARNING This MDM, ID 394760fd6714xxxx, took control of the cluster and is now the Master MDM. 
2017-10-04 12:08:34.309 MDM_BECOMING_MASTER WARNING This MDM is switching to Master mode. MDM will start 
running.

来自接收 MDM 的 exp.0 文件具有如下条目: 

 04/10 12:08:34.079823 Panic in file /data/build/workspace/ScaleIO-SLES12-2/src/mos/usr/mos_utils.c, line 73, 
function mos_MemMalloc, PID 9978.Panic Expression bCanFail . /opt/emc/scaleio/mdm/bin/mdm-2.0.13000.211(
mosDbg_PanicPrepare+0x115) [0x6a86f5] /opt/emc/scaleio/mdm/bin/mdm-2.0.13000.211(mos_MemMalloc+0x81) 
[0x6ac0d1] /opt/emc/scaleio/mdm/bin/mdm-2.0.13000.211(multiHeadMgr_GetUpdateMultiHeadsMsg+0x66) [0x57123c]
 /opt/emc/scaleio/mdm/bin/mdm-2.0.13000.211(tgtMgr_ConfigureTgt+0x9c1) [0x4d579e] 
/opt/emc/scaleio/mdm/bin/mdm-2.0.13000.211(tgtMgr_HandleWorkReq+0x41b) [0x4d6206] 
/opt/emc/scaleio/mdm/bin/mdm-2.0.13000.211() [0x6c57d8] 
/opt/emc/scaleio/mdm/bin/mdm-2.0.13000.211(mosUmt_StartFunc+0xea) [0x6c51af]
 /opt/emc/scaleio/mdm/bin/mdm-2.0.13000.211(mosUmt_SignalHandler+0x51) [0x6c65d1]
 /lib64/libpthread.so.0(+0x10b00) [0x7f844e8a6b00] /lib64/libc.so.6(sleep+0xd4) [0x7f844d8911a4]

/var/log/messages 文件显示 MDM 服务的多次重新启动,events.txt执行以下作: 

 systemd[1]: mdm.service: Main process exited, code=exited, status=255/n/a systemd[1]: mdm.service: 
Unit entered failed state. systemd[1]: mdm.service: Failed with result 'exit-code'. systemd[1]: mdm.service:
 Service has no hold-off time, scheduling restart. systemd[1]: Stopped scaleio mdm. systemd[1]: mdm.service: 
Start request repeated too quickly. systemd[1]: Failed to start scaleio mdm. systemd[1]: mdm.service: Unit 
entered failed state. systemd[1]: mdm.service: Failed with result 'start-limit'.

Cause

根本原因是 Linux作系统运行到内存上限,无法在初始化时授予 MDM 服务的内存请求。这是因为内核参数设置未正确调整。
提醒:如果作系统确实分配了比物理可用内存更多的内存,则会在消息文件中看到 oom-killer 消息,并且在这些故障之前会终止其他服务和进程。

Resolution

这不是 ScaleIO 问题。ScaleIO 按设计正常工作。

要检查和/或修改 vm.overcommit 设置,请执行以下步骤:

1.使用 SSH 以 root

用户身份登录 SDS2。在该节点上运行 

cat /etc/sysctl.conf | grep "vm.overcommit"
Ex.
[root@sds-node logs]# cat /etc/sysctl.conf | grep "vm.overcommit"
vm.overcommit_memory = 2
vm.overcommit_ratio = 50

3、运行以下命令。

sed -i 's/vm\.overcommit_memory = .*/vm\.overcommit_memory = 2/g' /etc/sysctl.conf
sed -i 's/vm\.overcommit_ratio = .*/vm\.overcommit_ratio = 100/g' /etc/sysctl.conf
sysctl -p

验证

[root@sds-node logs]# cat /etc/sysctl.conf | grep "vm.overcommit"
vm.overcommit_memory = 2
vm.overcommit_ratio = 100


对环境中的所有受影响的 SDS 重复这些步骤,以确保将其设置为建议的最佳实践设置。您无需将 SDS 置于维护模式即可执行此作。 

要了解有关这些设置的更多信息,请参阅有关过量使用记帐的 Linux 内核文档

Additional Information

检查 sysctl 内核参数是否过量使用内存:

# sysctl -a |grep commit
vm.overcommit_memory = 2 (default is 0)
vm.overcommit_ratio = 50 (default is 50)

在这种情况下,将“vm.overcommit_memory”设置为 2 意味着不会过度使用内存。超过过量使用限制的任何内存分配都会失败。系统提交的总地址空间不得超过交换 + 可配置的物理 RAM 量(默认值为 50%)。当此设置为 0 时,它将拒绝明显的过量使用请求,但允许根进程分配超过过量使用限制。 

要检查当前过量使用限制和提交的金额,请分别参阅 CommitLimit 和 Committed_AS,分别来自以下命令:

#cat /proc/meminfo 
MemTotal: 8174572 kB 
.. 
CommitLimit: 4087284 kB 
Committed_AS: 3879388 kB

此主机上有 8 GB RAM,CommitLimit 设置为 ~4 GB,即总地址空间的 50%。

 

要解决此问题,请在 /etc/sysctl.conf 中添加/编辑以下内容之一:

 - 将“vm.overcommit_ratio”更改为 100,以便作系统可以提交可用的总地址空间并重新启动。

要了解有关这些设置的更多信息,请参阅有关过量使用记帐的 Linux 内核文档

Affected Products

PowerFlex rack, VxFlex Product Family
Article Properties
Article Number: 000030300
Article Type: Solution
Last Modified: 22 Sept 2025
Version:  7
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.