As per best practice a common deployment is composed with minimum 3 ESXi hosts in the management cluster for DRS rules and each NSX controller with anti-affinity rules to "live" in each ESXi, there is not minimum for NSX edge cluster as well for payload or all the ESXi that will support the virtual production applications, but at the end it depend, in my experience sometimes 3 or 2 ESXi hosts with a huge amount of RAM and CPU is like a waste for this reason even counting (in management cluster) with other tools so it is possible to have living the NSX controllers in one cluster but it a kind of risk that we have to understand/justify why.
In some way it can, but in strict sense of the word the DFW cannot see the traffic between physical instances, let me explain my idea with an example.
let's assume we have a cluster with NSX Distributed Firewall prepared, we have tiers like DB and app tier, but we need to have interaction with legacy systems or because we can do something to virtualized them maybe a cost or a license we don't know just let's assume we have to back up the App tier and the DB tier, this means that VMs in DB tier and VMs in the App tier need some interaction with the backup application running in a physical server and is network base backup application, so we have installed agents in each tier, so how to protect the designated as network for backups? who to make secure and avoid some other VM's that can be attached to this backup network be able to "see" the traffic between agents of backup and the physical backup server without compromise the security? well DFW will set a bunch of rules for traffic in and out of the virtual tiers (DB and App) in this way we can set by means of grouping the IP addresses of physical servers as an object inside the firewall and make it part of a rule avoiding this problem, so for instance could be something like this:
source destination service action action applied to
physical_servers_ip DB_tier any_ecxep_permit block backup_network or dvPortgroup