Highlighted
2 Bronze

Re: Ask the Expert: SMB Protocol on an Isilon Cluster

Hello Mark,

Collecting packet traces is an art; you have to know enough about the problem in order to identify how to filter.  We used to be able to get away with no capture filters but as interfaces have gotten faster, its just not reasonable.  Two things tend to happen on a 10G interface when you don't use a capture filter:

1.) The trace becomes massive very quickly because of all the traffic

2.) Tcpdump cannot flush to disk fast enough so you end up with dropped frames making the trace unreliable

Even with just filtering on a single client, they can push enough load that the trace ends up with dropped frames.  At that point it becomes a question of what are you trying to accomplish with the trace.

When troubleshooting a failure via packet trace, I usually do the following:

-- Connect to \\cluster <do not add a share>

   -- If this works, you can almost always get away with filtering on just the client ip from a cluster side trace, because the problem is outside of authentication.

-- If this fails, connect to a node ip without a share \\x.x.x.x

     -- If this works, you are troubleshooting a kerberos type problem, the trace you need is from the client so you can see the traffic between client -> DC and Client -> cluster

    -- If this fails, both NTLM and Kerberos are failing, the trace you need is cluster side, and you can filter on the client, and all of the DCs that are in the same AD Site as the cluster.

**It should be noted that all of the above are assuming your client has a direct form of connection to the cluster.  ie they are not going through a firewall or wan accelerator.  If they do go through one of those devices, you will probably need port mirrors of various interfaces to get a full understanding of where the problem is.

Hopefully after reading this, it will make a little more sense as to why when working with support, we may ask you to take multiple traces.  The reality is, we are often troublehsooting while collecting packet traces and we are using them to narrow in on where the problem is.

Reply
Highlighted
3 Cadmium

Re: Ask the Expert: SMB Protocol on an Isilon Cluster

The problem I am working on now is an odd one.  We have a drive mapping set via Group Policy to a DFS Server with Isilon as the share DFS is encapsulating.  (ie client -> \\DFSServer\Share -> \\IsilonCluster\Share).  When the user first logs in they get a generic "Access Denied".  If they try to UNC to either the DFS path or Isilon path - it works completely fine. We map a 2nd drive via cli or gui to the same path Group Policy maps to - and it works fine.  But the drive mapped via Group Policy still fails.  Log out and log back in (not a reboot) and the problem goes away.  Disable SMB2 and the problem goes away.  Map a drive direct to Isilon via Group Policy,  and it works.

What i'm struggling to understand is when authentication happens and when Isilon interacts with a Domain Controller for both SMB2 and SMB1.

Reply
2 Bronze

Re: Ask the Expert: SMB Protocol on an Isilon Cluster

Hello Mark,

While SMB1 and SMB2 use two different code paths, there technically should not be much difference between them as DFS works over IOCTL.  On the positive side, you have a working and failing example that we can compare and it seems you have narrowed the issue down to something that should be reproducible in a lab.  The down side is, the first place I would look is the client side trace to figure out if it is failing against the DFS server or the Isilon cluster.  Since you are using GPO, that means you would need to port mirror the client port as you reboot the box.  If you want to PM the case number, I can take a look at the data that we have to see if I can identify where the failure is.

What i'm struggling to understand is when authentication happens and when Isilon interacts with a Domain Controller for both SMB2 and SMB1.

For both SMB1 and SMB2, authentication and communication with the DC always occurs as follows:

Step 1.) Figure out what version of SMB to use (smb1 or smb2)

Client -> SMB Negotiate Protocol Request -> Server

Client <- SMB Negotiate Protocol Response <- Server

Step 2.) Perform Authentication

Client -> Session Setup Request -> Server

  -- For NTLM the Server talks to the DC at this point

  -- For Kerberos, its the clients job to get the Kerb Ticket so the Server does not have to talk to the DC at all

Client <- Session Setup Response <- Server

Step 3.) Access the shares and do all other operations (ie findfirst, reads, writes, etc)

Client -> Tree Connect Request -> Server

Client <- Tree Connect Response <-Server

Once Step 2 is complete, Authentication is done, the Windows Token has been established and kept in memory for the life of the SMB Session.  When the client accesses files and permission checking is required in Step 3 and beyond, there is no need to talk to the DC to lookup group memberships.  Once the client tears down the SMB session, (for example a Session Logoff or TCP,RST) the client will have to go back through Step 2 before it can move on to Step 3 and beyond.

When you add DFS in the mix, the client has to:

-- Perform Step 1-3 against the DFS Server

-- Get redirected to the cluster via a dfs referral

-- Go through Steps 1-3 against the cluster

-- Finally connect to the path on the cluster

Since you have GPO in play as well, that initial connection against the cluster may be under the Clients Machine Context rather than the Clients User Context which means it may be coming in as an anonymous user which could be causing the Access Denied. 

The best course of action would be:

-- Start a port mirror of the client

-- Reboot the client and generate the error

-- Look at the trace for the Following:

-- Apply wireshark filter smb2.nt_status != 0 and figure out what frame the Access Denied is coming in

-- Determine if it is the DFS Server or the Isilon Cluster throwing the error

-- Follow the TCP Stream (right click option on the problem frame) and go to the beginning to locate the Session Setup to determine what User account was actually being used

Reply
Highlighted
3 Cadmium

Re: Ask the Expert: SMB Protocol on an Isilon Cluster

One issue i've ran into with isi_netlogger (this may be caused by tcpdump havn't investigated) is that it dies after very long captures... (hours)

We have an issue which happens very sporatically,  and by the time we identify it the problem is gone and not reproducable... We were trying to run a tcpdump (via isi_netlogger) for very long duration (overnight) and when we'd come in the next working the isi_netlogger command errored and no archive was created...

What is your thoughts on the best way to packet capture an event when you don't know when it will happen?

Reply
Highlighted
2 Bronze

Re: Ask the Expert: SMB Protocol on an Isilon Cluster

Yeah, that would make sense as to why you are having problems with isi_netlogger.  The beauty of OneFS running on FreeBSD is you can script just about anything.  If your issue causes a message to be logged, you can write a script that:

1.) Starts a trace

2.) Check for a failure to be logged

3.) If after 5 minutes no failure has been seen, stop the trace and start the process over again

I can send you an example script if you would like.

Reply
Highlighted
3 Cadmium

Re: Ask the Expert: SMB Protocol on an Isilon Cluster

No errors are logged on Isilon... It is specific to an application,  when we monitor and alert on... but by the time the error is generated inside the application it's too late.. The app writes a temporary file to an smb share on Isilon.  Then it tries to access the temporary file for some processing and gets a generic error saying "file not found" inside the application.  I'm trying to capture what it's trying to access where the file not found is generated,  but no luck..

I guess i'll set up collection every 20 minutes,  then kill it, and repeat... and hope!

Reply
Highlighted
2 Bronze

Re: Ask the Expert: SMB Protocol on an Isilon Cluster

Mark, you could also do the same with wireshark.  It can roll as many logs as you require and you can roll them on size or time.

Reply
Highlighted
1 Copper

Re: Ask the Expert: SMB Protocol on an Isilon Cluster

We have been having an issue with SMB connections going stale. Currently we have under 300 active connections, but over 5100 total connections. Is there a way to drop the inactive connections without affecting the active connections? What would be causing this in the first place? We are on 7.0.1.5

Reply
Highlighted
7 Thorium

Re: Ask the Expert: SMB Protocol on an Isilon Cluster

Any way to improve OSX users experience when using Isilon via CIFS. Browsing shares is much slower than on Windows. I did read this paper "docu45329_Using-Mac-OS-X-Clients-with-Isilon-OneFS-6.5" but we don't have SSD nodes and changing view in Finder did not do anything.

Thanks

Reply
Highlighted
2 Iron

Re: Ask the Expert: SMB Protocol on an Isilon Cluster

dynamox, how much slower are you seeing browsing for Macs, and are you dealing with large and wide directory structure?  Typically that's not an ideal workflow for SMB.  NFS handles it better due to the readdirplus calls it can make, but NFS comes with its own set of challenges on the Mac (like, AppleDouble's creation of dotbar files).

While I have heard of some folks modifying /etc/nsmb.conf on Macs to disable change notification and alternate data streams to try improve performance, past experience hasn't shown much (if any) improvement gleaned by making those changes.  And, making those changes to the client require that all clients get the change.

As the document describes, retrieving metadata faster from the Isilon is the best way to get the Finder to display objects more quickly.  Reducing network latency may not be possible, which is why the document does call out the 5-7x improvement in speed when using SSDs.

At this point, it's far to early for me to say if something like the SMB2 support in OS X 10.9 is going to make much of a difference, although it is something I'm starting to test with for an update to the Mac guide.

Reply