Unsolved
32 Posts
0
559
July 2nd, 2021 12:00
Poweredge R820 quad-Xeon only using 50% CPU
I have an R820 that I recently upgraded to 4 Xeon E5-4657L v2 CPUs, running Windows server 2016. I am attempting to run a multi-threaded batch-style application that is very CPU, RAM and IO-intensive. When running it on a dual-Xeon Precision I use 100% CPU and the jobs finish relatively quickly. When running individual jobs on this Poweredge, total CPU usage goes to 50% and never rises higher. Logical view shows half the logical processors at 100% and the other half at 0-5%. NUMA view shows two nodes at 0%, two at 100%. The job runs heavy IO against temp drives - I have tried a PCIe NVMe card as well as some RAID0 volumes using SAS disks, and a 128GB ram drive. The results are the same. If I run a second job in parallel, then all CPUs / Numa nodes / logical processors are at 100%. However, the two jobs combined are not much faster than when I run them individually.
I took a look at the technical documentation here and here and noticed that CPUs 3 and 4 don't have direct access to any devices; they appear to all go through "High speed Xcede connectors", then another CPU, then the hardware. Am I stuck at 50% CPU because of this hardware layout? It would appear that the OS is smart enough to send the jobs to the two fastest CPUs first and ignoring the other two because the data would have to traverse the first CPUs twice. Does this sound like a fair assumption, and can anything be done? I could throw more memory at the system and run two ram disks, one per channel, but would that matter or would it still bottleneck against the "host" CPUs? Alternatively I could try different PCIe slots for my NVMe and/or RAID cards if that would make any difference, but I'm not smart enough to figure out the best path forward. Does anyone have any recommendations on how to get more performance from this system?

