Mellanox ConnectX-6 Dx: High Corrected Bits on PAM4 NICs

Summary: Dell Mellanox ConnectX-6 Dx 100 Gb Network Interface Card (NIC) is reporting high rx_corrected_bits_phy due to PAM4 data transmission technique, which is normal and expected.

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

There are no issues being experienced, but upon reviewing the statistics in the environment, all hosts were found to be reporting bit errors.

  • High number of rx_corrected_bits_phy
  • High number of rx_err_lane_0_phy
  • High number of rx_err_lane_1_phy
  • High number of rx_err_lane_2_phy
  • High number of rx_err_lane_3_phy
user@hostname:~$ sudo ethtool -S enp139s0f1np1 | grep -E "correc|rx_err"
    rx_corrected_bits_phy: 153303800
    rx_err_lane_0_phy: 74171021
    rx_err_lane_1_phy: 79132779
    rx_err_lane_2_phy: 0
    rx_err_lane_3_phy: 0

user@hostname:~$ sudo ethtool -S lan0
    rx_corrected_bits_phy: 191025837
    rx_err_lane_0_phy: 759699
    rx_err_lane_1_phy: 190266147

Cause

The issue is related to the PAM4 data transmission technique used by the Mellanox ConnectX-6 Dx NIC.

  • The PAM4 technique uses four levels (00, 01, 10, 11) to represent data, which can transmit twice the data in the same bandwidth as the previously used technology, non-return-to-zero (NRZ).
  • However, PAM4 is more complex, susceptible to noise and errors, and requires better error correction.
  • The use of PAM4 electrical modulated signals requires mandatory running of the RS544 FEC technique to detect and correct errors in the data transmission.
  • The IEEE standards require all links involving 50G/100G PAM4 to achieve a pre-FEC Bit Error Rate (BER) of 2.4E-04 or better.
  • With RS544 FEC enabled and running, a link is expected to achieve a BER of 1E-12 or better.

Error Correction Mechanism

The RS544 FEC technique introduces 16 bins for error counting. In this system, bin-0 to counts received packet with zero error, bin-1 counts received packet with 1-bit error, and so on.

Bin0                   5540265380              11          0:00:04 ago
Bin1                   4420085                11          0:00:04 ago
Bin2                   578175                 11          0:00:04 ago
Bin3                   11808                  11          0:00:04 ago
Bin4                   1071                   11          0:00:04 ago
Bin5                    63                    11          0:00:04 ago
Bin6                    6                     6          0:00:04 ago
Bin7                    3                     2          0:01:02 ago
Bin8                    1                     1          0:00:04 ago
Bin9                    0                     0          never
Bin10                   0                     0          never
Bin11                   0                     0          never
Bin12                   0                     0          never
Bin13                   0                     0          never
Bin14                   0                     0          never
Bin15                   0                     0          never
Bin16+                  0                     0          never

BER Requirements

The effective physical BER shows how well the FEC is working to correct errors and ensure reliable data transmission.

The link is expected to achieve a BER of 1E-12 or better with RS544 FEC enabled and running.

Resolution

The rx_corrected_bits_phy observed are normal and expected on a link that uses the PAM4 data transmission technique. The FEC that is being used on the link corrects the errored bits resulting in a reliable link.

Verification Steps

To verify that the issue has been successfully resolved, follow these steps:

  • Check the rx_corrected_bits_phy counter value using the command sudo ethtool -S enp139s0f1np1 | grep -E "correc|rx_err" or sudo ethtool -S lan0 .
  • Verify that the counter value is within the expected range for a reliable link.
  • Check the Bin count display using the command Symbol Errors Per Codeword Codewords Changes Last Change to ensure that the bin count does not reach beyond bin-8.

Tools and Resources

The following tools and resources can aid in resolving the issue:

  • ethtool command-line utility
  • sudo command for running commands with elevated privileges

Precautions and Warnings

Caution: Ensure that the rx_corrected_bits_phy counter value is within the expected range for a reliable link to avoid potential issues.
Note: The FEC technique used on the link corrects the errored bits resulting in a reliable link.

Affected Products

OEMR R640, OEMR R6515, OEMR R6525, OEMR R660, OEMR R6615, OEMR R6625, OEMR R740, OEMR R740xd, OEMR R740xd2, OEMR R750, OEMR R750xa, OEMR R7525, OEMR R760, OEMR R760xa, OEMR R760XD2, OEMR R7615, OEMR R7625, OEMR R840, OEMR R860, OEMR R940, OEMR R940xa , OEMR R960, OEMR XR12, OEMR XR5610, OEMR XR7620, OEMR XR8610t, OEMR XR8620t, PowerEdge C6520, PowerEdge C6525, PowerEdge MX740C, PowerEdge R640, PowerEdge R6515, PowerEdge R6525, PowerEdge R660, PowerEdge R6615, PowerEdge R6625, PowerEdge R670, PowerEdge R740, PowerEdge R740XD, PowerEdge R740XD2, PowerEdge R750, PowerEdge R750XA, PowerEdge R7525, PowerEdge R760, PowerEdge R760XA, PowerEdge R760xd2, PowerEdge R7615, PowerEdge R7625, PowerEdge R770, PowerEdge R840, PowerEdge R860, PowerEdge R940, PowerEdge R940xa, PowerEdge R960, PowerEdge XE8640, PowerEdge XE9640, PowerEdge XE9680, PowerEdge XR12, PowerEdge XR5610, PowerEdge XR7620, PowerEdge XR8610t, PowerEdge XR8620t ...
Article Properties
Article Number: 000286734
Article Type: Solution
Last Modified: 01 Jul 2025
Version:  3
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.