Even though pooled LUNs may cost you some performance, you will stripe the data wider thus using more spindles for a given workload. As the desktop pools are persistent, FAST-VP will work out nicely here. Any "gaps" in the FAST-VP trick will be solved with the FAST-cache floating on top. This looks like a nice setup.
As for the RAID write penalty: This is where cache and FAST-cache comes in. Pending writes in FAST-cache will be held and commited to disk in full stripe writes if possible. This will boost the RAID5's write performance (theoretically even beyond the write performance of RAID10).
I firmly believe either setup will work, but lean towards the FAST-VP approach too. There is one differentiator between the two: Data on the EFDs in the FAST-VP pool will always have snappy access the first time round. FAST-cache will not kick in until the 3rd hit on a block. Only then the block gets promoted to FAST-cache. So FAST-cache will help on anything repetitive, but less so on single block access. Basically I'd go for the "FAST-cache" approach when you have desktops recomposing on a daily basis, and go the "FAST-VP" route for anything more "stable".
Where to put the replicas in these setups is an interesting one: You might consider to put them on the "highest tier" in FAST-VP, effectively putting them directly on EFD in this case. But on the other hand, they'll be sitting there after boot just taking up space (which is expensive on EFDs). After boot the replicas are hardly touched (most reads come from the linked clones).
I think (though never tested) you could easilly put them on the lowest available tier as well. As soon as some VMs start booting (and hit the same blocks), the active part of the replicas will be shooting up into FAST-cache anyway. All the boots after that will be served from FAST-cache and will be speedy. That would be something interesting to test!