Dell VCF on VxRail: Task "Create NSX Transport Node Collection" fails while adding a cluster to a workload domain
Summary: The "Create NSX Transport Node Collection" task fails when adding a cluster to a workload domain.
Symptoms
In rare cases, when adding a cluster to a workload domain using Workflow Optimization, the "Create NSX Transport Node Collection" task on the SDDC manager may fail. This issue was observed in VCF 5.1.1 and VxRail 8.0.210.
Create NSX Transport Node Collection
Failed
/var/log/vmware/vcf/domainmanager/domainmanager.log:
2024-11-13T03:56:01.925+0000 DEBUG [vcf_dm,673421c828f462beeb2892e623346f66,359c] [c.v.v.c.n.s.c.c.NsxtManagerTransportNodeOperations,dm-exec-4] Resolving Fabric Node error [Failed to install software on host. Failed to download NSX components on host. Check the host connectivity and if "/tmp" folder has enough space.] for Host: dummyhost.dummy.com
2024-11-13T03:56:01.930+0000 INFO [vcf_dm,673421c828f462beeb2892e623346f66,359c] [c.v.v.c.f.p.n.p.a.TransportNodeCollectionResolver,dm-exec-4] Waiting for TransportNode status change to progress or all Transport Node status to success
2024-11-13T03:56:01.930+0000 DEBUG [vcf_dm,673421c828f462beeb2892e623346f66,359c] [c.v.v.c.f.p.n.h.NsxtCommonOperations,dm-exec-4] Waiting 600000 ms for TN status to change to IN-PROGRESS or SUCCESS

Cause
The file transfer speed between NSX Manager and ESXi host is extremely slow, so NSX installation fails. If you check the /tmp folder on each ESXi hosts, you will see nsx-lcp-<something>.zip files of various sizes. This is probably because the file transfer was stopped midway due to a timeout.
Resolution
Resolving errors on one ESXi host on the NSX Manager will give you a much higher chance of success than trying to install on all hosts at once. Right-click on "Install Failed," click "VIEW ERRORS," and mark the errors as resolved to allow the installation to continue. Once you have made it successful on all ESXi hosts, selecting "RESTART TASK" from the SDDC Manager will also make it successful.


