Unsolved
This post is more than 5 years old
7 Posts
0
9898
September 7th, 2006 21:00
Poor network performance on 2950
I'm implementing a D2D2T solution for my company. It involves a 2950, an MD1000 with 13X 500GB SATA-II drives and a PowerVault 136T with LTO2 drives. I just installed and configured our equipment (the network, and tape library were pre-existing) and i was running some prelim tests against our exchange server and I noticed that my speed to disk was much less than my speed to tape. This was unreal to me because I benchmarked my disk speed at about 400MB p/s and when Backup Exec was doing the pre-allocation of the disk files, I was averaging about 300MB p/s. So I knew something wasn't right. There is also the fact that going straight to tape, i halfway saturate my GigE link. So I fired up iperf and decided to do some raw network testing involving 3 servers on the same Gigabit ethernet network (Cisco 6509) with standard 1500 MTU. I have a 2950, a 2850 and a 2650. All servers have 4 NICs, the two integrated NICS and a PCI-X Intel Dual Port MT1000 card. The 2850 and the 2650 are connected to the backup network via a single port off of the PCI-X Intel card. The 2950, I decided to connect one of the BroadCom Nics with TOE enabled to the backup network (this setup is preexisting. I didn't need to re-configure anything for my iperf testing).
So, from testing to the 2650 to the 2850 on Intel NICS with the default 8KB window, this is what I get:
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 8.00 KByte (default)
------------------------------------------------------------
[1884] local 129.5.6.2 port 5001 connected with 129.5.6.1 port 8593
[ ID] Interval Transfer Bandwidth
[1884] 0.0-10.0 sec 237 MBytes 199 Mbits/sec
Server listening on TCP port 5001
TCP window size: 8.00 KByte (default)
------------------------------------------------------------
[1884] local 129.5.6.2 port 5001 connected with 129.5.6.1 port 8593
[ ID] Interval Transfer Bandwidth
[1884] 0.0-10.0 sec 237 MBytes 199 Mbits/sec
Tuning that window up to 100K (my sweet spot), I get this:
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 100 KByte
------------------------------------------------------------
[1884] local 129.5.6.2 port 5001 connected with 129.5.6.1 port 10648
[ ID] Interval Transfer Bandwidth
[1884] 0.0-10.0 sec 766 MBytes 643 Mbits/sec
Server listening on TCP port 5001
TCP window size: 100 KByte
------------------------------------------------------------
[1884] local 129.5.6.2 port 5001 connected with 129.5.6.1 port 10648
[ ID] Interval Transfer Bandwidth
[1884] 0.0-10.0 sec 766 MBytes 643 Mbits/sec
Now, that is where I expect to be for my network with 1500 byte frames. But the 2950 gives much different results:
No difference when using the default window size.
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 8.00 KByte (default)
------------------------------------------------------------
[1896] local 129.5.6.156 port 5001 connected with 129.5.6.1 port 10774
[ ID] Interval Transfer Bandwidth
[1896] 0.0-10.0 sec 237 MBytes 199 Mbits/sec
Server listening on TCP port 5001
TCP window size: 8.00 KByte (default)
------------------------------------------------------------
[1896] local 129.5.6.156 port 5001 connected with 129.5.6.1 port 10774
[ ID] Interval Transfer Bandwidth
[1896] 0.0-10.0 sec 237 MBytes 199 Mbits/sec
But at what is probably the best window size for my network, my 2950 Broadcom nics are outperformed by
almost 50% (43% to be exact)
C:\>iperf -s -w 100k
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 100 KByte5
------------------------------------------------------------
[1896] local 129.5.6.156 port 5001 connected with 129.5.6.1 port 10838
[ ID] Interval Transfer Bandwidth
[1896] 0.0-10.0 sec 532 MBytes 446 Mbits/sec
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 100 KByte5
------------------------------------------------------------
[1896] local 129.5.6.156 port 5001 connected with 129.5.6.1 port 10838
[ ID] Interval Transfer Bandwidth
[1896] 0.0-10.0 sec 532 MBytes 446 Mbits/sec
This is with TOE turned on and the latest drivers. All Machines are running Windows 2003 SP1.
So what could be the deal here? Any takers?
0 events found
No Events found!


johnpickett
19 Posts
0
September 22nd, 2006 21:00
After calling Dell support this morning, they suggested removing the TOE from the system and giving it a try. I did so and my test box appears to be running stabily now. It doesn't sound like you're having the exact same issue but you might consider removing TOE to take that out as a possibility.
In case you (or anyone else with similar issues)don't know how or where the TOE component is located, I've created a photoset on my Flickr account to describe the process. Sorry, it just sounded like a fun afternoon activity and I was kinda bored. Note of course to make sure the server is disconnected from any power source, ground yourself, etc... All the normal safety precautions. If you mess it up, I take no responsibility :-)
http://www.flickr.com/photos/blurredvisionstudios/sets/72157594295518240/
Message Edited by johnpickett on 09-22-2006 04:45 PM
rhenriksen1
12 Posts
0
November 4th, 2006 00:00
cbellsisd
4 Posts
0
September 7th, 2007 16:00
The 7th server was just installed 2 weeks ago and is the only one running Windows (Server 2003 R2 Enterprise) and appears to be the only one having the random network/lockup problems listed above by "johnpickett". The other servers are all running Linux (Cent OS 4.5 64-Bit and Ubuntu 6.06.1 LTS 64-Bit)
It would appear that this is a Windows only problem for me. (although I will not be taking any chances with the other systems and will be pulling these clips out very soon if this solves my problems) I have installed the Latest drivers from the Dell site yesterday and still have had these same problems today.
The TOE module has the following printed on it:
TOE 2
FG027
It's also strange because TOE was showing Disabled in the BIOS and our systems are all ordered without TOE (it's useless when using LACP channeling anyway) so why is this Network card crippling device even installed?, what exactly does it do when set to disabled in the BIOS?
I removed it and the BIOS still says disabled (although option is now unchangeable)
In addition to the problems list by "johnpickett" every time the system was unresponsive also applies at the local console, if the system was not logged in there is no Login Dialog box or Ctrl-Alt-Del screen (just a blank gray screen and a movable mouse cursor, even on RDP session) if the system was logged in every thing you tried to do would lock up (clicking start-menu, opening task manager, etc...)
There were absolutely zero errors in the event logs except the ones informing me that the system experienced an unexpected shutdown afterward (which all i could do when it locks up that hard is pull the power) In addition to these the initial startup of windows is almost 3-4Mins long (the black loading screen, not the GUI) not sure if these are related to the NIC's TOE or not but it seems to be 1-2Mins faster now (still slower than should be)
For now i will wait a few days and see if these problems return after removing the device (since they happen at least 1-2 times per day i should not have to wait long)
Colin Bell
SISD Network Admin