Start a Conversation

Unsolved

This post is more than 5 years old

5343

July 10th, 2017 14:00

Large number of SMB connections from a single client

I've been seeing some odd behavior with our primary cluster which previously didn't seem to be causing an issue, however started causing timeouts and high CPU today.  Randomly I'll log on to the cluster webUI to find one node has around 10k-15k more SMB client connections than any other node.  Checking the client list, I'll see nearly all of the connections are coming from a single IP.  In this mornings instance, I had two nodes exhibiting the problem and two different clients for each node.

client.JPG.jpg

Checking the session list in the CLI only shows one session, however netstat will show an accurate count of the connections:

# netstat -an | grep 10.158.0.242 | wc -l

   15356

I wasn't able to reach the users this time, however previous instances showed only a single connection on the client machine.

The client on node 3 didn't have any files open, and on node 6 the client had only 5 directories open for read, no files though.  I killed the sessions and the connections all dropped off within a minute, and the nodes started responding again.

Has anyone seen an issue like this?  I've got an SR open to take a look and I started a log gather prior to disconnecting the sessions, but I haven't heard from Support yet.  Any advice on where to look for clues in the mean time would be much appreciated.

Isilon OneFS v7.2.1.3 B_MR_7_2_1_3_069(RELEASE): 0x702015000300045:Tue Jun  7 11:03:04 GMT 2016    root@sea-build7-03:/b/mnt/obj/b/mnt/src/sys/IQ.amd64.release   clang version 3.3 (tags/RELEASE_33/final)

300 Posts

July 11th, 2017 04:00

i had a high amount of connections from single clients when they were using "portable apps" like browsers from their homedrive (which is stupid).

multiple sessions should have their cause at the client since it's hard for the server side to validate if a session request is necessary or not.

So the question is: WHAT application / workflow causes this at client side.

another way around could be to reduce the session lifetime / tcp.keepidle values - which may have effect on everyone else. so i would focus on the client.

Rgds

--sluetze

4 Posts

July 11th, 2017 07:00

It's entirely possible the users were running an app from a share, however they didn't seem aware of it at the time.  Luckily one of the users from yesterday is on my team, so I can work with him and possibly narrow down the behavior.  The odd thing though is that it started over the weekend, and he's been out on vacation, so if he wasn't working on something remotely, I'd be very curious what kicked off the connections with his credentials.

Setting a limit on the idle time might not be too bad if set fairly high.  People don't like restarting their desktops around here it seems, and I'll see sessions which have been running for over a month.  Granted, not idle, but if I can end sessions which have been idle for 10 hours, that might be fine so long as the user doesn't notice the reconnect in the morning.  I'll take a look at the effects on my test environment, thanks!

4 Posts

July 11th, 2017 07:00

It's funny you mention Citrix & Terminal servers.  The first few instances of this issue were coming from a subnet we've dedicated to VDI, and the first time I saw it was right after we started our POC with a small group.  So my immediate thought was a problem relating to how VDI was connecting to the homes we have set up for VDI clients.  Unfortunately I've now had three instances of the issue coming from client desktops where VDI was not being utilized.  One user stated she was simply running a copy job, though it's possible she was also running an app stored on a share as Sluetze suggested and wasn't thinking about it.  Many users have shares mapped to drives by desktop admin groups and aren't fully aware of how everything works behind the scenes.

450 Posts

July 11th, 2017 07:00

Are those IPs perhaps Terminal Servers, or Citrix Servers?  A total shot in the dark, but perhaps it's something along those lines.  Either that or application servers, with really terribly written applications.

~Chris

1 Rookie

 • 

19 Posts

July 22nd, 2020 01:00

Sorry for bumping up this thread. But I think we have similar issue. Client is backup media agent writing to Isilon share. I suspect each backup session uses same IP address (i.e. same node) thus causing slow backup speed. The share is configured on the client as UNC path using SmartConnect name. THeoretically this should rotate IP addresses, but in fact nothing of the kind happens. It's just using single IP every time. I checked queries to DNS server and there are none. Does this SmartConnect only work when there are many clients? I mean, if there is one client with already open connections, there is no IP rotation for the following connections.

1 Rookie

 • 

567 Posts

July 22nd, 2020 08:00

@Rivendell_86,

 

Have you tried to disable DNS cache on Isilon?

isi network groupnets modify groupnet0 --dns-cache-enabled=false

36 Posts

July 23rd, 2020 08:00

@PhilLam isi_cbind_d acts as a cache for the cluster operating as a DNS client. It doesn't have anything to do with SmartConnect.

 

Tim

36 Posts

July 23rd, 2020 08:00

@Rivendell_86 , sadly, that's not how the Windows stack works. It's impossible to get multiple connections to different nodes from the same client if you use the same DNS names as part of the UNC path because the client code is extremely aggressive about caching and reusing connections. To establish multiple connections from the same client, you either need to be using SMB3 multipath (which we support), or you need to connect to different names so that Windows won't "piggy back" on top of the existing connection.

In answer to the other part of your question, SmartConnect works fine, even for a single client, but it can only work if the client actually makes additional DNS lookups. If you trace the requests on the client, you will find that it does not, and so there is nothing that the cluster can do. It's entirely a client-side issue.

1 Rookie

 • 

19 Posts

July 23rd, 2020 10:00

@isi_tim On the client we have 2 interfaces aggregated and SMB3 Multichannel enabled. On the cluster we have also enabled SMB Multichannel. But still, no DNS queries when new connections are made.

1 Rookie

 • 

567 Posts

July 23rd, 2020 10:00

@isi_tim,

Understood Tim, I had to disable it on some customer sites to make SmartConnect funtion correctly.

Phil

36 Posts

July 23rd, 2020 11:00

That is correct. You won't see any additional SmartConnect activity.

SMB3 Multichannel isn't DNS-based, it's part of the SMB protocol and the client and server negotiate to enable it. If  you use link aggregation on both ends, I believe you're going to be limited to 4 connections under the covers because the OneFS side will "look like" a single NIC with RSS enabled from the client's perspective so that is what the client should attempt. If you use separate interfaces on the server side, then it could potentially use 8 connections instead. In either case, tf it's enabled, you should see more than one connection at each end.

No Events found!

Top