VxRail: How to determine resync size on deduplication and compression cluster
Summary: How to determine resync size on deduplication and compression cluster.
Instructions
The general recommendation from VMware is to leave 30% of your resources free including CPU, memory, and storage. However, depending on the size of a cluster the amount a specific host can affect the rest of the cluster changes. If you take 100 and divide it by the number of hosts, it tells you what percentage of the entire cluster a single host represents. For an example, losing a host in a four-host cluster removes 25% of your resources. In a cluster of 10 hosts, a single host only represents 10%. Due to this, a 'safe' percentage or number is not universal. Putting a single host into maintenance mode with ensured accessibility lets you know how heavily tasked the other hosts become regarding CPU and memory utilization (not storage*).
To calculate storage especially with RAID 5/6 with deduplication and compression enabled, it becomes a little complicated. The first thing that you must do is discover how much room the information storage would take when ballooned which will be slightly different on each host. On a host run,
localcli vsan storage list | grep 'Group UUID:' | awk '{print $5}'| sort | uniq | while read cache; do dgTotalInflate=''; echo -en "\nCache Tier Disk: $cache\n"; localcli vsan storage list | grep -B1 "Group UUID: $cache"| grep "VSAN UUID:"| awk '{print $3}' | grep -v $cache | while read cap; do diskLogUsed=$(cmmds-tool find -f json -t DISK_STATUS -u $cap | grep content | awk '{print $37}' | sed -e 's/[\t\n\r,]//g'); diskPhysUsed=$(cmmds-tool find -fjson -t DISK_USAGE -u $cap | grep content | awk '{print $19}' | sed -e 's/[\t\n\r,]//g'); dedupRatio=$( awk "BEGIN{ print "$diskLogUsed" / "$diskPhysUsed" }"); dgTotalInflate=$(awk "BEGIN{ print "$dgTotalInflate" + "$diskLogUsed"}"); echo -en " Capacity Disk: $cap : Dedup Ratio ${dedupRatio}x\n"; dgTotalInflateGB=$(awk "BEGIN{ print "$dgTotalInflate" / 1073741824 * 1.25}"); echo "$dgTotalInflateGB" > /tmp/dgtigb.txt; done; echo -en " Total Expected Space Inflation: $(cat /tmp/dgtigb.txt)GB\n" ; done
The command displays compression ratios and total for each disk in each disk group. Add together the "total" displayed for each disk group to know how much space the data on the host would be expanded to during the resync.
As an example, if you had a cluster of five hosts that each host has 1 TB total capacity and if there was a total of 3 TB of data on the cluster. In this cluster, if you wanted to know the re-sync size if host 1 was lost, you would need to know its current used capacity. We already know from our example that the host has 1 TB total capacity. So we must know how much data is on the host and how much room that data takes up when ballooned with the command above. For this example we pretend the host has 0.6 TB of data on its disks and when the command above was ran it said that 0.6 TB would balloon to 0.9 TB. Below is the math on how to find the re-sync size for the host.
-From vSAN
Total capacity of vSAN (example 5 TB)
used capacity of vSAN (example 3 TB)
-From the host in question
total capacity of node's capacity disks (example 1 TB)
used capacity of node's capacity disks (.6 TB)
result from dedup/compression check command above (.9 TB)
Find total storage without counting host in question (5 TB - 1 TB) = 4 TB
Total capacity of vSAN - total capacity of node's capacity disks
Find total used storage during resync (3 TB - .6 TB + .9 TB) = 3.3 TB
used capacity of vSAN - used capacity of node's capacity disks + result from dedup/compression check command above
Take the two totals above and subtract them (4 - 3.3) = .7 TB
'total storage without counting host in question' - 'total used storage during resync'
In this situation there should be .7 TB free space left on the cluster during the resync if host 1 is lost.
In this case there, there will be some free space and the free space will increase some after the resync completes. Always remember that performance will be reduced when over 70% of vSAN is used and this performance degregation will continue to get worse as the utilization increases.