Hello Daniel - your point is something we are looking at, but rather in the context how to make an ideal 3-node cluster that allows shared utilization of various workloads as the monster VM is running a nightly job and during the day the resources would be hardly utilized otherwise.
However, "16 core VM needs 16 cores *at the same time* to execute”
This hasn’t actually been true since 4.0. The development of ‘relaxed coscheduling’ in 4.x and further refined in 5.x makes this far more rare. *sometimes* its needed in order to reduced the core clock skew in the VM to prevent excessive SMP slip, but not usually.
Everything you could want to know: http://www.vmware.com/files/pdf/techpaper/VMware-vSphere-CPU-Sched-Perf.pdf
We are eager to test vSphere 5.5 low latency feature to bypass the virtualization layer that should be available once the bits are GA.
Deploying Extremely Latency-Sensitive Applications in VMware vSphere 5.5.