PowerEdge: How to Optimize Small I/O Performance for NVMe Drives Behind PERC Controllers in Linux
Summary: In Linux, NVMe drives connected behind a PowerEdge RAID Controller (PERC) may exhibit lower performance than expected during small I/O workload testing. This occurs because the OS identifies these drives as standard SCSI block devices (/dev/sdX) rather than native NVMe devices (/dev/nvmeXnX), leading to the application of suboptimal default I/O schedulers. ...
Instructions
1. The Root Cause
- Native NVMe: Directly connected NVMe drives use deep, hardware-managed command queues. Linux defaults their scheduler to none to bypass OS-level bottlenecking.
![]()
- NVMe behind PERC: When managed by a PERC controller, the drive is presented as a SCSI device. Most Linux distributions default to the scheduler for SCSI devices to
mq-deadline. - The Conflict: The
mq-deadlinescheduler was designed for legacy mechanical drives to optimize seek time and minimize head movement. For high-speed NVMe drives, this scheduler adds unnecessary latency and CPU overhead, throttling total IOPS.
2. Verification and Immediate Adjustment
To achieve maximum IOPS for small I/O workloads, the scheduler should be set to none.
- Check the current scheduler:
- Run the following command as a superuser (replace {sdX} with your device name, such as, sda):
cat /sys/block/{sdX}/queue/scheduler
Example output: (The brackets indicate the active scheduler.)
B. Change the scheduler to 'none' (Runtime): Run the following command to apply the change immediately:
echo "none" > /sys/block/{sdX}/queue/scheduler
3. Ensuring Persistence
The manual change above is not persistent and will revert after a reboot. Also, /dev/sdX identifiers may change if drives are added or removed.
To make this change permanent, it is recommended to create a UDEV rule based on the device's WWID.
Note: Large I/O performance may also be affected.
For NVMe drives behind a PERC controller, it is recommended to set the Linux queue scheduler to none.