Unsolved

This post is more than 5 years old

14 Posts

34727

June 30th, 2008 17:00

PowerConnect 6248 Noob

We just completed the installation of our new 6248. The intention was to give our two main servers 10GbE, and were configured as follows:

 

Server : HP NC510C 10GbE Adapters

::

1.5m CX Cables

::

PowerConnect 6248 (with two CX4 modules)

::

20-50m CAT6 Cables

::

Workstations : Each 1GbE LOM

 

Originally, with the 5324, running netperf between servers or from server to desktop, we usually got close to 1GbE (typically a score of about 970). However, with the above configuration, we're only getting 370 Mbps from server (~4%) to desktop (~40%) and 810 Mbps from server to server (~8%). This seems really slow, and not been familiar to Layer 3 networking or the 6248, I'm wondering if I'm missing something blindingly obvious. Thanks for any help.

 

Also, is it correct that each CX4 module can only do 12Gbps between both ports? So if both servers are connected to one module, the theoretical best I could get is 6Gbps between the two? Not that that's slow or anything.

14 Posts

June 30th, 2008 19:00

Very strange.... not sure if this is related, but the ARP table contains only one entry - my Windows 2008 machine (used for development). Seems when I ping a down computer on the switch, I get the following error:

 

Reply from 192.168.100.41: Destination host unreachable.

 

Looks like the switch is routing everything through my machine ?!?! What gives? If it is doing this, then that would certainly explain the performance problems.

July 1st, 2008 06:00

Can you share / post your switch configuration and explain whether the 10G attached servers are in the same VLAN or not ?

 

What IP address ranges are you using - and howa re you assigning - DHCP or Static ?

 

DO you have any other routing devices active or attached ?

 

Do you jumbo frames (10G) enabled or disabled currently ?

 

What MAC statistics are showing on the server/host ports - any errors or port discards ?

14 Posts

July 2nd, 2008 13:00

The switch is pretty much stock configured - we didn't buy it for the Layer 3 functionality, but to have a cheap 10GbE switch (and maybe that's the problem - bringing L2 pre-conceived notions into it???). Anyway, everything is on the same subnet (192.168.100.0/24) and everything is on VLAN1 (I assume), the IPs are assigned through DHCP by our Windows domain except for the servers (obviously) which are all in the 192.168.100.192 to 192.168.100.255 range.

 

I did try to enable jumbo frames, however, that seemed to cause more problems than solve. For example, with jumbo frames enabled (on the switch and both servers), a ping -l 5000 would just hang and return that it's unroutable. So I disabled that for now - I can try and get it working later, though I had assumed jumbo frames would only help with server-to-server traffic as they're the only machines on the network which support them.

 

To temporarily circumvent the problem, I've reconnected the two LOM's on the switches and performance has increased significantly, though it's still nowhere near what the 5324 was able to get before, just gigabit-to-gigabit.

 

I honestly cannot see any errors, dropped packets on the switch. On the couter summary shows the following:

 

Recv UP, Xmit UP, Recv Non-UP, Transmit Non-UP, Recv Errors, Xmit Errors
49: 1/XG1: 2226059, 1428759, 274, 271792, 0, 0
50: 1/XG2: 6357193, 4178150, 784, 260267, 0, 0

 

There are no drop events, etc, which would indicate any problems. Netstat on the servers shows zero discards and only one error out of receiving 569,157,991 and sending 118,300,665 bytes.

 

For reference - just to make sure I'm not out of my mind - the best performance I'm getting from a gigabit machine to the 10GbE adapters is 600mbps running netperf (and I had to move one server to a different CX4 module, so each server is on a different one). Between them, however, I get 300-400Mbps, and SOMETIMES that drops to about 10Kbps!

 

Here's the whole config:

 

console#show running-config
!Current Configuration:
!System Description "Dell 48 Port Gigabit Ethernet, 2.1.0.13, VxWorks5.5.1"
!System Software Version 2.1.0.13
!
configure
stack
member 1 2
exit
ip address 192.168.100.248 255.255.255.0
ip domain-name tera1.com
ip name-server 192.168.100.4
ip name-server 192.168.100.202
ip name-server 192.168.100.204
logging buffered size 20
username "admin" password b10d7d7f05d46eb7f03e7c2653e11895 level 15 encrypted
flowcontrol
no spanning-tree
!
interface ethernet 1/g1
mtu 9216
exit
!
interface ethernet 1/g2
mtu 9216
exit
!
interface ethernet 1/g3
mtu 9216
exit
!
interface ethernet 1/g4
mtu 9216
exit
!
interface ethernet 1/g5
mtu 9216
exit
!
interface ethernet 1/g6
mtu 9216
exit
!
interface ethernet 1/g7
mtu 9216
exit
!
interface ethernet 1/g8
mtu 9216
exit
!
interface ethernet 1/g9
mtu 9216
exit
!
interface ethernet 1/g10
mtu 9216
exit
!
interface ethernet 1/g11
mtu 9216
exit
!
interface ethernet 1/g12
mtu 9216
exit
!
interface ethernet 1/g13
mtu 9216
exit
!
interface ethernet 1/g14
mtu 9216
exit
!
interface ethernet 1/g15
mtu 9216
exit
!
interface ethernet 1/g16
mtu 9216
exit
!
interface ethernet 1/g17
mtu 9216
exit
!
interface ethernet 1/g18
mtu 9216
exit
!
interface ethernet 1/g19
mtu 9216
exit
!
interface ethernet 1/g20
mtu 9216
exit
!
interface ethernet 1/g21
mtu 9216
exit
!
interface ethernet 1/g22
mtu 9216
exit
!
interface ethernet 1/g23
mtu 9216
exit
!
interface ethernet 1/g24
mtu 9216
exit
!
interface ethernet 1/g25
mtu 9216
exit
!
interface ethernet 1/g26
mtu 9216
exit
!
interface ethernet 1/g27
mtu 9216
exit
!
interface ethernet 1/g28
mtu 9216
exit
!
interface ethernet 1/g29
mtu 9216
exit
!
interface ethernet 1/g30
mtu 9216
exit
!
interface ethernet 1/g31
channel-group 1 mode auto
mtu 9216
exit
!
interface ethernet 1/g32
channel-group 1 mode auto
mtu 9216
exit
!
interface ethernet 1/g33
channel-group 2 mode auto
mtu 9216
exit
!
interface ethernet 1/g34
channel-group 2 mode auto
mtu 9216
exit
!
interface ethernet 1/g35
mtu 9216
exit
!
interface ethernet 1/g36
mtu 9216
exit
!
interface ethernet 1/g37
mtu 9216
exit
!
interface ethernet 1/g38
mtu 9216
exit
!
interface ethernet 1/g39
mtu 9216
exit
!
interface ethernet 1/g40
mtu 9216
exit
!
interface ethernet 1/g41
mtu 9216
exit
!
interface ethernet 1/g42
mtu 9216
exit
!
interface ethernet 1/g43
mtu 9216
exit
!
interface ethernet 1/g44
mtu 9216
exit
!
interface ethernet 1/g45
mtu 9216
exit
!
interface ethernet 1/g46
mtu 9216
exit
!
interface port-channel 1
hashing-mode 6
exit
!
interface port-channel 2
hashing-mode 6
exit
snmp-server user admin READ_noAuthNoPriv
snmp-server community public rw
exit

14 Posts

July 2nd, 2008 15:00

There was another forum post about someone asking the same thing - I think the supposition comes from if you use the stacking modules, it precludes the ability to use the second uplink port, implying that 24Gbps (12 per port) is the maximum available. From Broadcom's site, however, the main chips in the 6248 support 10/12Gbps per channel, and that each of those backend have twho channels. I guess the stacking limitation is a software one, not a hardware one.

 

Yes, it does seem as though the ports are all still set to jumbo frames - I didn't notice that (I thought I had switched it back - I guess in my haste I only did on the servers). Some of the client machines do support jumbo frames (with Intel and Broadcom LOMs), but some we've had to swap-out with cheap Realtek cards which I believe do not (atleast the software doesn't allow it). I'll try re-enabling jumbo frames to see if it helps any.

 

Regarding routing performance, is the 6248 faster with IP routing enabled? With a single VLAN and subnet, can I enable IP routing, and if so, how?

 

Also, you said but not with the HP NICs - does that mean the HP NICs can't pull that off (iow, they're no good), or you didn't test/use them.

 

Thanks for all the input so far!

184 Posts

July 2nd, 2008 15:00

Sorry i am confused. You said you enabled jumbo frames but then disabled them. I still see the 9216 MTU set on all the ports. Do your client machines support jumbo frames? IMO you wont ever see over 300-400mbs on a windows machine without jumbo frames. I have seen close to 10gb speeds on the 6248's with routing enabled but it wasnt with HP CX4 Nics, it was with Intel CX4 Nics.

 

Can i ask where you heard about the 2 CX4 ports sharing 12gb/s? I know each port is limited to 12gb/s but i didnt think they shared that bandwidth between them.

 

14 Posts

July 2nd, 2008 16:00

Leaving Jumbo Frames on on the switch, I also enabled it on my machine and the server. Absolutely no change to the throughput - still "capped" at about 400Mbps (40% IO) and about 800Mbps from server-to-server (about 24% for a single CX4 channel - not sure how it works exactly to get "10Gbps"?). As I said before, eventhough we weren't using jumbo frames, we were getting about 80% gigabit bandwidth on the 5324 before.

 

This just feels like such a consistent yet arbitrary cap, that I can't shake the feeling that there's something configured that's deliberately restricting performance.

184 Posts

July 2nd, 2008 16:00

layer 2 should be just a fast or faster than layer 3. What i meant was i have seen these switches route at near 10gbe speeds which is why your server <-> server speeds seem off to me . Not to say the HP nics are bad, but i have only seen/used  intel,dell,cisco and force10 10gbe gear personally and it has all handled line rate with no problems.

 

What brand/type of CX4 cables are you using? Have you tried/or do you have any other cables?

14 Posts

July 2nd, 2008 16:00

TurboTwin? I'm not exactly sure - they were bought from an online retailer. Let's see - they say "MADISON CABLE TYPE CL2 75C 28 AWG (UL)  TurboTwin(tm) RoHS COMPLIANT" along them, and a sticker has "P/N: 29-1300 4X ML PLATCH-PLATCH ,5M EXT CABLE: 03/08 MADE IN TAIWAN". Not sure if any of that means anything.

 

I do have some spare 0.5m cables here, but I'm not sure I can make them reach - at least I know I can't in normal use, but I might if I pull the server half-way out of the rack...

14 Posts

July 2nd, 2008 17:00

Okay, the HP NICs only go as high as 8000 maximum frame size (the NIC properties say 8000, the HP Config Utilitiy says 8014), and with that, 7980 bytes seems to be my ping cap - what's strange is that one byte above it and I just get an error (Request timed out.). Shouldn't the packet just fragment above the MTU size?

 

I tried using netsh, and it accepted an MTU of 8014, but the NIC properties still say 9014, so I'm guessing it's hard coded into the ASIC. That's annoying... The switch supports 9216 bytes and my NIC seems to support only 4088, 9014 or 16128 bytes. Is there any issue with all three of these being different? Or should I find the lowest common denominator and stick with that?

 

I think it's worth mentioning that regardless of the MTU (even back at 1514), the CPU's involved never reach epic proportions - there's maybe a 5-10% hit when trying to push my "maximum" of 400Mbps.

184 Posts

July 2nd, 2008 18:00

Just to eliminate the switch how about direct connecting the 2 servers to each other and seeing what sort of speeds those NIC's can push.

 

 

14 Posts

July 2nd, 2008 20:00

I'll have to get back to you on that - the servers are in use right now.

14 Posts

July 4th, 2008 13:00

Okay, finally had the opportunity.

 

I direct connected both servers and used 10.10.10.9 and 10.10.10.10 on a /30 subnet (Windows wouldn't let me do a /31 P2P subnet?!?!) Anyway, one way (one acting as the server, one as the client), I was able to get 810Mbps. Running both ways, I was able to get 715Mbps each way for a combined 1.4Gbps. Running four separate server instances on one machine and four clients on the other I was able to get a whopping 2.3Gbps - or about 575Mbps per stream (ouch).

 

I decided to see where this scales out to and ran a 32-thread, dual CPU, bidirectional test (each server is running two servers and one client test - or 32 send threads and 32 receive threads per machine). Each thread was able to reach between under 80-100Mbps gaining a total throughput of 5793.5 Mbps at about 23.4% CPU utilization.

 

In all these tests, the performance indicator was showing that the CPU utilization was entirely kernal times. With all the CPU power still in reserve, and knowing that it is possible to get the performance fairly high (5.7Gbps isn't bad - but having to run 64 threads to get it is dumb), there still seems to be something holding everything back.

 

It is evident from this, however, that the problem is not the switch - I didn't get any better numbers with the switch in place than with a direct connection.

14 Posts

July 4th, 2008 17:00

Using a 0.5m cable, the single client-server test jumped to about 900Mbps. However, multithreaded streaming didn't improve at all. So I'm suspecting the problem isn't the cable, and it's not the switch. That leaves Windows, the drivers or the NICs.

184 Posts

July 4th, 2008 17:00

Well i am glad you have made some progress with it. I hope you can find a solution. You may also want to look into a different NIC if thats plausable for you.

 

14 Posts

July 19th, 2008 21:00

I worked with NetXen on this. We diagnosed that there was a firmware issue and had the card updated.

 

Directly connected to one-another over a 0.5m cable, I was ablt to get 2.7Gbps with one thread (ntttcp), over 5Gbps with two threads, and close to 10Gbps with eight threads. Yay - so it's NOW neither the cable, the drivers nor the NICs.

 

So I connected it back to the switch, and playing it safe, reset the switch back to defaults. When connecting via remote desktop - it's the strangest experience. The screen refresh is excelent, but when I click, move a window, there's a good 5-10 second lag.

 

So, with a single thread, I can just about reach 5Gbps (this seems really odd, but continuing). With two threads, performance drops way down to about 2-3Gbps (with 500 and 1300Mbps per thread!). At eight threads I was hovering around 2.6Gbps with 200-500Mbps per thread (really wide range!).

 

While throughput is okay - not stellar by any stretch of the imagination - there's still a really horrible lag every time I connect. For example, browsing the file shares it can take litterally, a good ten seconds to show anything - then "pop" it's there. It's frustrating as you-know-what!

 

This is so bizzare, I thought adding a picture might help showing the VERY strange networking throughput during the test.

 

 

Anyway, if ANYONE has ANY ideas what the heck is going on, I'd be really appreciative. Thanks.

No Events found!

Top