190 Posts

January 26th, 2011 06:00

Just a thought - or perhaps a shot in the dark...

From your output it looks like you are using Jumbo frames (MTU of 9000) - is this enabled on your network infrastructure?  Are you doing iSCSI  (typically where you see jumbo frames)?  If your CGE's aren't in use except for testing you might want to set the MTU to 1500 and see if that changes anything .  I'm not a jumbo frame guru but I have always been under the assumption that to use jumbo frames correctly they need to be enabled end-to-end (this would include your CIFS clients).

Dan

40 Posts

January 26th, 2011 08:00

I did, for kicks, setup an IP address interface that used MTU of 9000.  I had already set this up a while ago to test backup over Jumbo.  We are only using CIFS for production.

Anyway, I setup the test so the test CIFS server was using this Jumbo frame connection.  It is on a separate VLAN on my procurve switch which is configured for jumbo frames.  the network card is also configured for jumbo frames.  There is no difference in the test. 

40 Posts

January 26th, 2011 08:00

Interesting point. We don't use jumbo frames and none of our interfaces are set at 9000 so I don't know why this command shows them at 9000.  But it is something I may test.

46 Posts

January 28th, 2011 15:00

You can calculate your retransmission percentage as  retrans  / #sent pac * 100  = retrans%

EMC wants that to be < 0.01%, between 0.01 & 0.1% is on the verge, > 0.1% is a problem.

You are well within a resonable value: 66962 / 2022297234  * 100 = .0033  < 0.01%

I'd suggest opening up a case (or escalate the one you still have open?), but if you want to poke around your self, on the celerra try running a number of the server_stats command.

server_stats server_ -i -table

"-table cifs"  look at the uSec/call, it's in microseconds so divide by 1000 to get milliseconds.  This should tell you how long it takes the celerra to perform paticular CIFS operations.

"-table dvol"  this is disk stats and look at the number of read/write ops going to the paticular drives and see if you are hamming any lun heavily.

"-table fsvol"  You can use this to see the filesystem I/O, there might be a lot of I/O going to a filesystem that should be quiet competing for resources

I tend to start with an interval of "1" first to look for spikes or bursts and then increase it to 10, 30 or 60sec.

Another one is in unisphere to go into celerra monitor, and get the Clariion stats.  Look for queueing, cache flushes, etc.  (writes should be through to cache on the Clariion, unless write cache is filling up they "should" be faster than reads).

Something I've just ran into on mine is I was running into ufslog issues (hopefully it is fixed, jury is out as to if it is or not).  Run "server_log server_ " to get a log.  I saw that I was hitting the soft threshold multiple times a second, if you are seeing lots of these (not a handfull) you might want to contact support.  I had to do a number of run arounds with them from network traces, etc.  I run millions of little files over NFS, with your big files I wouldn't expect this.

If you want to pull/turn on collection of the backend disk perf stats, check primus emc177167 on how to do it.  If you don't have a analyzer license on it, you'll get encrypted naz files that can only be opened by support.  Having that in hand already with at least 3x hours collected already avoided some the latency of "let's turn on stats to collect at least 3 hours or data... it's late in the day so I'll send it to engineering tomorrow and you'll get a detailed analysis the day after"

40 Posts

January 29th, 2011 16:00

Thanks for this information.  I have opened a ticket but it does not seem to be going anywhere.  In fact, we have opened tickets in more then one office because we are each independently run.  Each Celerra was configured independently and they all have the same problem. 

I looked at most of the stats you provide but will go through them again.  It seems that the backend is not even being hit hard.  It seems like the NAS is what is slowing things down.  Don't know for sure.

40 Posts

February 3rd, 2011 11:00

We are homing in on the problem of write performance.  I can't believe other people have not reported this problem.  We almost exclusively use our Celerra servers for CIFS.  We use lots of checkpoints, setting up jobs for every 2-3 hrs during the day on our file systems and also weekly.  We use most of the maximum checkpoints (90 max? or whatever it is). 

We are fairly certain that these checkpoints along with the checkpoints for our replicator v2 (we also replicate each of our file systems offsite) are the culpritof slow writes.

We have setup a file system on the performance pool created from the vault drives for testing.  No replication or checkpoints are on this pool.  For this poool we get about 115MB/S read and about 60MB/s write.  Still much slower write but more acceptable performance.

Oh, and the other office purchased the fast cache on their celerra and it does nothing to help their write performance.  A bunch of wasted money if you ask me.  I don 't have fast cache and see equal performance for the type of work we do.

4 Operator

 • 

8.6K Posts

February 4th, 2011 09:00

given the way Fast Cache works its not surpise that it wont make a difference writing new files.

In order for Fast Cache to make a difference you have to have a number of I/O's to the same 64k cluster of blocks so that it gets promoted into the FAST Cache.

From then on you are working with SSD speed and latency for these blocks for both reads and writes.

That does make a tremendous difference for applications that do use the same blocks like Exchange, databases, ....

For these apps we have seen reduced latency and 3+ times the I/O performance

The FAST Cache white paper explains that into more detail.

Rainer

40 Posts

February 25th, 2011 11:00

Just wanted to provide an update.  We have configured our Celerras in every which way based on EMC support.  IT appears that Celerra replicator V2 causes major slow downs when writing files to Cifs shares.  Either way, we do not get the write performance our Windows servers (with Clariion or local SAS storage).  The best write performance on the Celerra of our large files is around 60-80MB/s.  With replicator running, speed drops down to about 20MB/s. 

I will post updates if we ever make any more progress on this. 

2 Intern

 • 

157 Posts

February 17th, 2012 08:00

I have been doing benchmarking on our new VNX5300 exclusively with NFS and seeing similar results. So, I went to our NS960 which does alot of CIFS and NFS sharing, but generally not much in the way of serious load. What I have concluded is reads out of the celerra are as fast as the backend disk (or network will allow) but writes to it suck regardless of the number or type of spindles at the destination. This has to be occuring in more environments but unless people are not pushing beyond 60MB/sec, they would never care that it can't go any faster than this. Rainer or anyone, have any suggestions? I am at a loss as to how it is possible that the celerra with it's cache, combined with all the cache and performance of a 960, is not able to munch data to any FS without choking at around 60-70MB/sec.

dart 6.0.41-4

We do not have replicator running but there are some checkpoints for some FS. Also, this is not a CIFS vs NFS issue, the performance is exactly the same regardless of the protocol.

thanks

Dave

2 Intern

 • 

157 Posts

February 17th, 2012 08:00

Well, as I said, I’m testing currently against a VXN5300 which has no checkpoints at all. It is not related to my problem, but I think it is a bug in there somewhere.

40 Posts

February 17th, 2012 08:00

We only get about 10 MB/s write performance but we use a lot of checkpoints.  Has something to do with write on copy method.  Supposedly there is a fix in the latest 6.0.51.6 nas code to help write performance.  Would love to hear from someone if it helps. 


More info here  https://community.emc.com/thread/124864

1 Rookie

 • 

121 Posts

July 12th, 2012 07:00

Which command are you used to get the below out put

Name     Mtu   Ibytes        Ierror  Obytes        Oerror   PhysAddr

****************************************************************************

fxg0     9000  3016360536    11      2709827918    0        0:60:16:32:56:46

fxg1     9000  1237764640    0       0             0        0:60:16:32:56:47

mge0     9000  2762780894    0       4205729750    0        0:60:16:40:ee:1

mge1     9000  331775952     0       52356319      0        0:60:16:40:ed:ed

cge0     9000  1964674952    0       1408110665    0        0:60:16:2b:5c:96

cge1     9000  610079448     0       2747930285    0        0:60:16:2b:5c:97

40 Posts

July 12th, 2012 08:00

What kind of performance issues? Do you have checkpoints on the file systems in questions. If so, expect SSSSSSlow write performance.

Paul Shane | Systems Administrator | paul.shane@milliman.com

Milliman | 1550 Liberty Ridge Drive, Suite 200 | Wayne, PA 19087-5572 | USA

Tel +1 610 975 8012 | Fax +1 610 687 4236 | Mobile +1 610 389 5088 | milliman.com

1 Attachment

1 Rookie

 • 

121 Posts

July 12th, 2012 08:00

I Am sorry, i am really new to  Celerra stuff as i am manging only normal oprations on my celerra.

But we are facing lot of perfomance issues in our Celerra ad would like to know how can i get the info to troubleshoot.

Can you please let me know how can i find out the perfomace issues.

Can i use Celerra monitor to know something about pefomance.if, yes can you please let me know the parfomance parameters.

4 Operator

 • 

8.6K Posts

July 12th, 2012 08:00

You’re kidding – right ? Never seen a netstat output ?

No Events found!

Top