PowerFlex: MDM switchover failing due to memory allocation failures - mos_MemMalloc
Summary: When switching MDM ownership (manually or other), the receiving MDM cannot come up properly due to memory allocation failures leaving the cluster with no Primary MDM.
Symptoms
The events output from the receiving MDM /opt/emc/scaleio/mdm/bin/showevents.py will have multiple entries for trying to take over Primary MDM responsibilities, all within a short time of each other:
2017-10-04 12:08:33.915 MDM_CLUSTER_BECOMING_MASTER WARNING This MDM, ID 394760fd6714xxxx, took control of the cluster and is now the Master MDM. 2017-10-04 12:08:33.915 MDM_BECOMING_MASTER WARNING This MDM is switching to Master mode. MDM will start running. .. 2017-10-04 12:08:34.309 MDM_CLUSTER_BECOMING_MASTER WARNING This MDM, ID 394760fd6714xxxx, took control of the cluster and is now the Master MDM. 2017-10-04 12:08:34.309 MDM_BECOMING_MASTER WARNING This MDM is switching to Master mode. MDM will start running.
The exp.0 file from the receiving MDM has entries like this:
04/10 12:08:34.079823 Panic in file /data/build/workspace/ScaleIO-SLES12-2/src/mos/usr/mos_utils.c, line 73,
function mos_MemMalloc, PID 9978.Panic Expression bCanFail . /opt/emc/scaleio/mdm/bin/mdm-2.0.13000.211(
mosDbg_PanicPrepare+0x115) [0x6a86f5] /opt/emc/scaleio/mdm/bin/mdm-2.0.13000.211(mos_MemMalloc+0x81)
[0x6ac0d1] /opt/emc/scaleio/mdm/bin/mdm-2.0.13000.211(multiHeadMgr_GetUpdateMultiHeadsMsg+0x66) [0x57123c]
/opt/emc/scaleio/mdm/bin/mdm-2.0.13000.211(tgtMgr_ConfigureTgt+0x9c1) [0x4d579e]
/opt/emc/scaleio/mdm/bin/mdm-2.0.13000.211(tgtMgr_HandleWorkReq+0x41b) [0x4d6206]
/opt/emc/scaleio/mdm/bin/mdm-2.0.13000.211() [0x6c57d8]
/opt/emc/scaleio/mdm/bin/mdm-2.0.13000.211(mosUmt_StartFunc+0xea) [0x6c51af]
/opt/emc/scaleio/mdm/bin/mdm-2.0.13000.211(mosUmt_SignalHandler+0x51) [0x6c65d1]
/lib64/libpthread.so.0(+0x10b00) [0x7f844e8a6b00] /lib64/libc.so.6(sleep+0xd4) [0x7f844d8911a4]
The /var/log/messages file shows the multiple restarts of the MDM service as the events.txt does:
systemd[1]: mdm.service: Main process exited, code=exited, status=255/n/a systemd[1]: mdm.service: Unit entered failed state. systemd[1]: mdm.service: Failed with result 'exit-code'. systemd[1]: mdm.service: Service has no hold-off time, scheduling restart. systemd[1]: Stopped scaleio mdm. systemd[1]: mdm.service: Start request repeated too quickly. systemd[1]: Failed to start scaleio mdm. systemd[1]: mdm.service: Unit entered failed state. systemd[1]: mdm.service: Failed with result 'start-limit'.
Cause
Resolution
This is not a ScaleIO issue. ScaleIO is working as Designed.
To check and/or modify the vm.overcommit settings, follow these steps:
1. Log in to the SDS using SSH as root
2. Run
cat /etc/sysctl.conf | grep "vm.overcommit" Ex. [root@sds-node logs]# cat /etc/sysctl.conf | grep "vm.overcommit" vm.overcommit_memory = 2 vm.overcommit_ratio = 50
3, Run the following commands.
sed -i 's/vm\.overcommit_memory = .*/vm\.overcommit_memory = 2/g' /etc/sysctl.conf sed -i 's/vm\.overcommit_ratio = .*/vm\.overcommit_ratio = 100/g' /etc/sysctl.conf sysctl -p
Validation
[root@sds-node logs]# cat /etc/sysctl.conf | grep "vm.overcommit" vm.overcommit_memory = 2 vm.overcommit_ratio = 100
Repeat these steps on all impacted SDSs in the environment to ensure that they are set to the recommended best practice settings. You do not need to place the SDS into maintenance mode to perform this operation.
To learn more about these settings, see the Linux Kernel documentation on overcommit-accounting
Additional Information
Check the sysctl kernel parameters for overcommit of memory:
# sysctl -a |grep commit vm.overcommit_memory = 2 (default is 0) vm.overcommit_ratio = 50 (default is 50)
In this case, having "vm.overcommit_memory" set to 2 means do not overcommit memory. This fails any memory allocation that exceeds the overcommit limit. The total address space commit for the system is not permitted to exceed swap + a configurable amount (default is 50%) of physical RAM. When this setting is at 0, it denies obvious overcommit requests, but root processes are allowed to allocate above the overcommit limit.
To check the current overcommit limit and amount committed, see CommitLimit and Committed_AS, respectively, from the following command:
#cat /proc/meminfo MemTotal: 8174572 kB .. CommitLimit: 4087284 kB Committed_AS: 3879388 kB
There is 8 GB of RAM on this host, and the CommitLimit is set to ~4 GB, which is 50% of total address space.
To fix this issue, add/edit one of the following in /etc/sysctl.conf:
- Change "vm.overcommit_ratio" to 100, so the OS can commit the total address space available and reboot.
To learn more about these settings, see the Linux Kernel documentation on overcommit-accounting