PowerFlex 3.x Add SDC command causing MDM panic and failover
Summary:
Once performing add SDC command (scli --add_sdc /), MDM failover occurs.
Symptoms
Scenario
MDM is running in "Restricted mode"
User attempts to add an SDC with IPs that do not exist in the approved SDC IPs - which forces the MDM to create a new SDC object.
SDC name that is used is of SDC name which already existed before.
MDM panic occurs on the primary MDM and the secondary MDM takeover.
Symptoms
The user runs add SDC command using IPs and not GUID, and receives a communication error as stdout for example:
# scli --add_sdc --sdc_ip 123.234.234.201 --sdc_name SDC40 Error: MDM failed command. Status: Communication error
Checking the MDM exp.0 file, the panic below could be observed.
21/12 02:43:26.897784 Panic in file /data/builds/workspace/ScaleIO-Common-Job@2/src/mdm/control/obj_container.c, line 2291, function objContainer_GetObjId, PID 30902.Panic Expression (((void *)0) != (pObjHeader)) && ((pObjHeader)->magic == 0x68cab8db) . /opt/emc/scaleio/mdm/bin/mdm-3.5.1000.175(mosDbg_PanicPrepare+0x13a) [0x9a51ca] /opt/emc/scaleio/mdm/bin/mdm-3.5.1000.175(objContainer_GetObjId+0xe8) [0x981768] /opt/emc/scaleio/mdm/bin/mdm-3.5.1000.175(iniMgr_ApproveInitiator+0x359) [0x8c7fe9] /opt/emc/scaleio/mdm/bin/mdm-3.5.1000.175() [0x562a08] /opt/emc/scaleio/mdm/bin/mdm-3.5.1000.175(netRecvGroup_WaitForWork+0x3dc) [0x7b2cfc] /opt/emc/scaleio/mdm/bin/mdm-3.5.1000.175(netRecvGroup_WaitForWorkLoop+0x18) [0x7b3008] /opt/emc/scaleio/mdm/bin/mdm-3.5.1000.175(mosUmt_StartFunc+0x7a) [0x7fb5da] /lib64/libc.so.6(+0x48140) [0x7ff1b5a8d140] [(nil)]
Impact
Add SDC command fails - SDC will not be added to the system MDM failover.
Cause
As part of the "add SDC" command, the user has to use additional parameters such as: "SDC IPs" or "SDC GUID" with an optional parameter of "SDC name."
Once the "add SDC" command is issued, the MDM tries to lookup for an existing SDC with those parameters, in our scenario, the user supplied an unfamiliar "SDC IPs," and accordingly, the MDM is not familiar with the newly supplied "SDC IPs" due to the change introduced in version 3.0, hence MDM has created a new SDC object.
In addition, the "SDC name" parameter that was used while running the command was an existing SDC name, hence it has immediately deleted the SDC object that was just created as MDM has detected that this name is already in use.
Once MDM tried to access the deleted SDC object, it has asserted and a failover occurred.
Resolution
PowerFlex Dev' team will address this issue in our next releases.
To overcome this scenario and successfully add SDCs which were connected previously, use the "SDC GUID" parameter instead of "SDC IP."
For example:
scli --add_sdc --sdc_guid FA4BFBF6-546A-11E8-B40D-0050568D283E
Impacted Versions
All versions higher than 3.0
Fixed In Version
PowerFlex future release