I’m building a hyper converged 3 nodes Storage Spaces direct cluster with brand new Dell PowerEdge 730xd that comes with a Mellanox connectx-3 pro nic direct from Dell.
I have Attached the powershell script used for building this. Now to the problem, when I enable storagespacesdirect in my cluster, RDMA traffic is starting to flow, and I get expected performance when I running tests on my storage. So far so good, until I reboot any of my 3 nodes. I get blue screen with Memory_management and it just keep looping so I’m not able to access the server again. And before rebooting I’ve checked that no storage jobs is running and also drain roles from the node.
I have figured out that if set Cluster Service to manual and stop the service before rebooting, I can reboot the node without problem. And then start the cluster service when node is rebooted.
I have tested both the drivers that comes with Dell Driver OS Pack and Latest drivers from Mellanox (MLNX_VPI_WinOF-5_35_All_win2016_x64.exe) but with same result.
i did some testing this weekend, i took out mellanox card from the servers and used the two builtin 10Gbit interfaces on the motherboard instead. reinstalled everything and builded a S2D solution without RDMA. i could reboot the hosts a couple of times without BSOD, but suddenly i got BSOD with Memory_Management again :( . So this means that the problem is not related to the Mellanox cards.