Start a Conversation

Unsolved

This post is more than 5 years old

D

14503

May 29th, 2012 10:00

R210II network problem

Hii,

We're struggling for days now on an intermittent issue.

Hardware & Software status:

we're are currently installing 8 PowerEdges R210 II.

All running with Centos 6.2 ( linux : 2.6.32-220.17.1.el6.x86_64 / bnx2 driver version 2.1.11 )

All firmwares are update ( BIOS 1.2.3 ( except on 2 machines with 2.0.5 ) /  BMC : 1.85 / bcm5716 : 6.2.16 )

before update was :  ( BIOS 1.2.3  /  BMC : 1.80 / bcm5716 : 6.2.15 )

ISSUE :

When powering-up the machine, the bnx2 driver could not initialize and we get the following message :

May 21 22:24:26 srv4 kernel: bnx2: fw sync timeout, reset code = 5020002

May 21 22:24:26 srv4 kernel: bnx2 0000:01:00.0: eth0: <--- start MCP states dump --->

May 21 22:24:26 srv4 kernel: bnx2 0000:01:00.0: eth0: DEBUG: MCP_STATE_P0[0003650e] MCP_STATE_P1[0003600e]

May 21 22:24:26 srv4 kernel: bnx2 0000:01:00.0: eth0: DEBUG: MCP mode[0000b880] state[80004000] evt_mask[00000500]

May 21 22:24:26 srv4 kernel: bnx2 0000:01:00.0: eth0: DEBUG: pc[08003588] pc[08003588] instr[3c030020]

May 21 22:24:26 srv4 kernel: bnx2 0000:01:00.0: eth0: DEBUG: shmem states:

May 21 22:24:26 srv4 kernel: bnx2 0000:01:00.0: eth0: DEBUG: drv_mb[05020002] fw_mb[00000003] link_status[0000000f] drv_pulse_mb[00008004]

May 21 22:24:26 srv4 kernel: bnx2 0000:01:00.0: eth0: DEBUG: dev_info_signature[44564947] reset_type[01005254] condition[0003650e]

May 21 22:24:26 srv4 kernel: bnx2 0000:01:00.0: eth0: DEBUG: 000003cc: 00000000 00000000 00000000 00000000

May 21 22:24:26 srv4 kernel: bnx2 0000:01:00.0: eth0: DEBUG: 000003dc: 00000000 00000000 00000000 00000000

May 21 22:24:26 srv4 kernel: bnx2 0000:01:00.0: eth0: DEBUG: 000003ec: 00000000 00000000 00000000 00000000

May 21 22:24:26 srv4 kernel: bnx2 0000:01:00.0: eth0: DEBUG: 0x3fc[00000000]

May 21 22:24:26 srv4 kernel: bnx2 0000:01:00.0: eth0: <--- end MCP states dump --->

May 21 22:24:26 srv4 kernel: bnx2: fw sync timeout, reset code = 5020002

May 21 22:24:26 srv4 kernel: bnx2 0000:01:00.1: eth1: <--- start MCP states dump --->

May 21 22:24:26 srv4 kernel: bnx2 0000:01:00.1: eth1: DEBUG: MCP_STATE_P0[0003650e] MCP_STATE_P1[0003600e]

May 21 22:24:26 srv4 kernel: bnx2 0000:01:00.1: eth1: DEBUG: MCP mode[0000b880] state[80004000] evt_mask[00000500]

May 21 22:24:26 srv4 kernel: bnx2 0000:01:00.1: eth1: DEBUG: pc[08003588] pc[08003588] instr[3c030020]

May 21 22:24:26 srv4 kernel: bnx2 0000:01:00.1: eth1: DEBUG: shmem states:

May 21 22:24:26 srv4 kernel: bnx2 0000:01:00.1: eth1: DEBUG: drv_mb[05020002] fw_mb[00000006] link_status[0000000f] drv_pulse_mb[00008004]

May 21 22:24:26 srv4 kernel: bnx2 0000:01:00.1: eth1: DEBUG: dev_info_signature[44564907] reset_type[00025254] condition[0003600e]

May 21 22:24:26 srv4 kernel: bnx2 0000:01:00.1: eth1: DEBUG: 000003cc: 00000000 00000000 00000000 00000000

May 21 22:24:26 srv4 kernel: bnx2 0000:01:00.1: eth1: DEBUG: 000003dc: 00000000 00000000 00000000 00000000

May 21 22:24:26 srv4 kernel: bnx2 0000:01:00.1: eth1: DEBUG: 000003ec: 00000000 00000000 00000000 00000000

May 21 22:24:26 srv4 kernel: bnx2 0000:01:00.1: eth1: DEBUG: 0x3fc[00000000]

May 21 22:24:26 srv4 kernel: bnx2 0000:01:00.1: eth1: <--- end MCP states dump --->

This happens :

-  80%  of the time at reboot

- 30% at cold power-up

Upgrading the firmware changed nothing.

After reviewing the source code of the bnx2 driver, it could comes from an hardware issue :

/* This value (in milliseconds) determines how long the driver should

* wait for an acknowledgement from the firmware before timing out.  Once

* the firmware has timed out, the driver will assume there is no firmware

* running and there won't be any firmware-driver synchronization during a

* driver reset. */

#define BNX2_FW_ACK_TIME_OUT_MS 1000

thanks for any help !

May 29th, 2012 16:00

I have not seen any issues like this elsewhere. Are you able to compile the module with a larger value for BNX2_FW_ACK_TIME_OUT_MS (for example 5000), for test? Also because of this: kerneltrap.org/.../6273676 I am curious if this also happens on a slightly older kernel (2.6.30)?

6 Posts

June 1st, 2012 05:00

We tried to increase the timeout with no success. The strange thing is that the BMC is not accessible in this situation...

93 Posts

June 2nd, 2012 16:00

If you download this iso image, OpenManage Server Administrator Live CD linux.dell.com/.../OMSA64-CentOS55-x86_64-LiveCD.iso, and boot to it, do you see the same issue?  it is a CentOS 5 Live environment.

No Events found!

Top