VxRail: Nodes May Experience High LSOM Congestion

Summary: VxRail nodes from 4.7.511-526 and 7.0.130-132 may experience high memory congestion leading to performance and possible vSAN outages. Workaround exists to disable services causing the issue; upgrading to 4.7.530/7.0.200 resolves this issue. Based from VMware KB 82619 ...

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

Note: the info provided based off VMware KB 82619 This hyperlink is taking you to a website outside of Dell Technologies.(External Link). Review the article for any potential newer updates.

When running VxRail versions 4.7.511-526 and 7.0.130-132, you may experience the following issues:

  • "Number of elements in the commit tables" are more than 100k and do not decrease over a period of hours.
  • Loss of ability to see files and folders on the vSAN datastore
  • Severe performance degradation
  • One or more nodes presenting high Local Log Structured Object Management (LSOM) memory congestion (see command 1).
  • "Number of elements in the commit tables" are more than 100k (see command 2).
  • Memory Congestion that has propagated to all the nodes in the cluster.
  • Logs messages in vmkernel.log:
    • LSOM: LSOM_ThrowCongestionVOB:3429: Throttled: Virtual SAN node "HOSTNAME" maximum Memory congestion reached.
  • Logs messages in vobd.log and vmkernel.log
    • LSOM_ThrowAsyncCongestionVOB:1669: LSOM Memory Congestion State: Exceeded. Congestion Threshold: 200 Current Congestion: 204.

The following scripted commands can be used to determine if the host may be experiencing this issue.
Script 1

while true; do echo "================================================"; date; for ssd in $(localcli vsan storage list |grep "Group UUID"|awk '{print $5}'|sort -u);do echo $ssd;vsish -e get /vmkModules/lsom/disks/$ssd/info|grep Congestion;done; for ssd in $(localcli vsan storage list |grep "Group UUID"|awk '{print $5}'|sort -u);do llogTotal=$(vsish -e get /vmkModules/lsom/disks/$ssd/info|grep "Log space consumed by LLOG"|awk -F : '{print $2}');plogTotal=$(vsish -e get /vmkModules/lsom/disks/$ssd/info|grep "Log space consumed by PLOG"|awk -F : '{print $2}');llogGib=$(echo $llogTotal |awk '{print $1 / 1073741824}');plogGib=$(echo $plogTotal |awk '{print $1 / 1073741824}');allGibTotal=$(expr $llogTotal + $plogTotal|awk '{print $1 / 1073741824}');echo $ssd;echo " LLOG consumption: $llogGib";echo " PLOG consumption: $plogGib";echo " Total log consumption: $allGibTotal";done;sleep 30; done ;

Sample output

Fri Feb 12 06:40:51 UTC 2021  

529dd4dc--xxxx-xxxx-xxxx-xxxxxxxxxxxx
   memCongestion:0 >> This value is higher than 0 ( ranger 0-250 )
   slabCongestion:0
   ssdCongestion:0
   iopsCongestion:0
   logCongestion:0
   compCongestion:0
   memCongestionLocalMax:0
   slabCongestionLocalMax:0
   ssdCongestionLocalMax:0
   iopsCongestionLocalMax:0
   logCongestionLocalMax:0
   compCongestionLocalMax:0
529dd4dc-xxxx-xxxx-xxxx-xxxxxxxxxxxxxx
    LLOG consumption: 0.270882
    PLOG consumption: 0.632553
    Total log consumption: 0.903435

Script 2

vsish -e ls /vmkModules/lsom/disks/ 2>/dev/null | while read d ; do echo -n ${d/\//} ; vsish -e get /vmkModules/lsom/disks/${d}WBQStats | grep "Number of elements in commit tables" ; done | grep -v ":0$"

Sample output
(This is on cache disks only; you can ignore any results of capacity disks)

52f395f3-03fd-f005-bf02-40287362403b/   Number of elements in commit tables:300891
526709f4-8790-8a91-2151-a491e2d3aec5/   Number of elements in commit tables:289371



 

Cause

Scrubber configuration values were modified in vSAN 6.7 P04, and vSAN 7.0 U1 P02 releases to scrub objects at a higher frequency. This change results in persisting scrubber progress of each object more frequently than before. If there are idle objects in the cluster, then the scrubber accumulates commit table entries for these objects at LSOM. Eventually, the accumulation leads to LSOM memory congestion.

Idle objects in this context refer to objects which are unassociated, powered off VMs, replicated objects, and so forth.

Resolution

Long-term resolution: update to 4.7.530 or later or 7.0.200 or later

If a host has a high number of elements in the commit tables, as determined in script 2, one of the two following steps is recommended to clear the congestion.
  1. Put the problem host in maintenance mode with Ensure Accessibility, then reboot the host.
  2. Unmount and remount each host's disk groups using Ensure Accessibility.
You may have to do this on multiple nodes one at a time in the cluster.

Workaround:
If unable to upgrade, for now still implement the following advanced settings changes to mitigate against this issue occurring.
  1. Change scrubber frequency to once per year:
esxcfg-advcfg -s 1 /VSAN/ObjectScrubsPerYear​​​​
  1. Disable scrubber persist timer:
esxcfg-advcfg -s 0 /VSAN/ObjectScrubPersistMin

Affected Products

VxRail, VMWare Cloud on Dell EMC VxRail E560F, VMWare Cloud on Dell EMC VxRail E560N, VxRail 460 and 470 Nodes, VxRail Appliance Family, VxRail Appliance Series, VxRail D Series Nodes, VxRail D560, VxRail D560F, VxRail E Series Nodes

Products

VxRail G410, VxRail G Series Nodes, VxRail E460, VxRail E560, VxRail E560 VCF, VxRail E560F, VxRail E560F VCF, VxRail E560N, VxRail E560N VCF, VxRail E665F, VxRail E665N, VxRail G560, VxRail G560 VCF, VxRail G560F, VxRail G560F VCF , VxRail Gen2 Hardware, VxRail P Series Nodes, VxRail P470, VxRail P570, VxRail P570 VCF, VxRail P570F, VxRail P570F VCF, VxRail P580N, VxRail P580N VCF, VxRail P675F, VxRail P675N, VxRail S Series Nodes, VxRail S470, VxRail S570, VxRail S570 VCF, VxRail S670, VxRail Software, VxRail V Series Nodes, VxRail V470, VxRail V570, VxRail V570 VCF, VxRail V570F, VxRail V570F VCF ...
Article Properties
Article Number: 000196966
Article Type: Solution
Last Modified: 17 Dec 2025
Version:  3
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.