Oh and by the way - we have investigated further and were able to isolate the issue.
It only happens when the write IO is made to an unallocated space inside the vmfs datastore - that is, if we use thin-provisioned vmdk, do a restore or a replication with veeam (by design such operations create a snapshot of the target vmdk and write to it. The snapshot is essentially empty so a write IO to it triggers our issue).
We have also opened an SR with Veeam. They did some tests on our system using VMware's vixlib as well as their own software and came to a conclusion that the issue is with the storage:
PowerFlex Mark
40 Posts
0
January 30th, 2018 23:00
Hi,
Apologies for the late response. Please could you open a service request in order to look into this further.
Please provide the get_info from the master MDM. We may need to engage Engineering, so a formal service request will be required.
Thanks for your help.
ezaytsev
4 Posts
0
February 14th, 2018 04:00
We have created an SR 10142536 and uploaded the dump there.
ezaytsev
4 Posts
0
February 15th, 2018 15:00
Oh and by the way - we have investigated further and were able to isolate the issue.
It only happens when the write IO is made to an unallocated space inside the vmfs datastore - that is, if we use thin-provisioned vmdk, do a restore or a replication with veeam (by design such operations create a snapshot of the target vmdk and write to it. The snapshot is essentially empty so a write IO to it triggers our issue).
We have also opened an SR with Veeam. They did some tests on our system using VMware's vixlib as well as their own software and came to a conclusion that the issue is with the storage:
99 - Target Write Busy % AKA “Target”
ezaytsev
4 Posts
0
February 16th, 2018 00:00
Sure, I will report any updates on the case here.
As for monitoring setup - we are using ScaleIO exporter + prometheus + grafana