Start a Conversation

Unsolved

This post is more than 5 years old

1332

July 24th, 2015 01:00

Isilon X400 User authentication takes around 5 -10 seconds as first log in every day.

Hello guys,

Here's my environment, I hope someone kindly help me out to reslove folowing issue.

  Version: OneFS 7.1.1.2

  Model: X400*5

  

   Always every user's first access is delaying around 5 - 10 seconds.

     I have investigated this issue and found that looks like our AD objects total is larger than the Isilon's max cache size.

     - I have already set max cache size as 47.68M. It can be obtained by "isi zone zones view System".
    Unfortunately, seems our AD objects size it too much to get it cached on the Isilon.

    We have around 9,000 users and 96,000 of security groups.

   - We got following error in "/var/log/lsassd.log". Is 47.68M the limitation of this box?

   
2015-07-16T23:32:51+09:00 <30.4> HOSTNAME1(id2) lsass[6764]: [lsass] The
current cache size (50147202) is larger than the cap   (50000000) - evicting old objects

  

I would appreciate it if someone could provide me with hint or the answer.

  - Can this limit be extended?

  - Or is there any workaround that we can try to avoid this slowness? (I hope 9,000 users and 96,000 groups doesn't impact to the Isilon.)

  - Is there any action that we can do to Active Directory?

   Thanks in advance.

104 Posts

July 24th, 2015 10:00

Isilon Lover,

Do you know if you are using NTLM or kerberos?

From what I can gather from the above information it sounds like you are using NTLM for authentication based on the fact that it's the first time authenticating were we are seeing latency.

A few things to try to confirm:

If you are using a static IP to access shares Eg: \\IP_of_node\share

This is using NTLM for authentication

If smart connect zone is being used Eg: \\smartconnect_zone_name\share we can check if this is using a c-name (canonical name) or spn (service principle name) by running the following command on a windows client:

replace with the smart connect zone name:

klist |findstr " "

7 Posts

July 24th, 2015 11:00

Hi Isilon Lover,

There is a way to increase cache size which, based on a few service requests I researched, is the preferred fix.  However, it depends on how many access zones you have.  Could you please let me know how many you have and I can put together some steps for you to take?

July 24th, 2015 18:00

I only have one Zone, System. Please let me know if there is I can do.

July 24th, 2015 18:00

I have already had NTLM checkbox checked. I will check the response speed of when we type ipaddress/share.

Also, ususlly we use smartconnect or CNAME, then I have alreday checked all of them are regstered on SPN.

I confirmed I have that Kerberos ticket by using the command you have kindly told me.

However, the problem is the latency of when we access via SmartConnect or CNAME.....

Thanks for your advise.

104 Posts

July 27th, 2015 06:00

Isilon Lover,

This is going to need a little more in-depth analysis, we are going to need to capture the latency with in a pcap (packet capture) to catch the cause.

I would like to request you open a service request to have this further analyzed.

If you could post the SR (Service Request) number in the forum, I can make sure we can get a TSE (Technical Support Engineer) available to assist with this.

To create a service request, you have a couple options:

1. Log in to your online account on support.emc.com and go to this page: https://support.emc.com/servicecenter/createSR

2. Call in to EMC Isilon Support at 1-800-782-4362 (For a complete local country dial list, please see this document: http://www.emc.com/collateral/contact-us/h4165-csc-phonelist-ho.pdf)

July 27th, 2015 17:00

Hi Shane,

Unfortunately, we don't have EMC direct warranty, local vendor provides their support on behalf of EMC. But I know they have already raised a certain request to you regarding this. Maybe do I retrieve local vendor raised number and let you know?

104 Posts

July 30th, 2015 13:00

Isilon Lover,

Sorry for the delayed response, Yes. if you can get me the EMC SR (Service request) number from your vendors support I can look at where the SR is at on our end. Then I can offer up some help in the SR to move this forward to resolution.

July 30th, 2015 22:00

Hi Shane,

Vendor says SR#72138390 and SR#72974468 are so. Please check it.

104 Posts

July 31st, 2015 07:00

Isilon Lover,

I checked out SR 72138390 which is currently closed.

I was able to go through the .pcap that was supplied in the SR and found a few informative things.
You are using IP to connect to the cluster which defaults to NTLM authentication. (I would post packets from the capture but these reveal IP's of local devices)
What the .pcap is showing is requests to // / so a static IP is being used.

Since NTLM is being used, session set up needs to build a token for the user each (1st) time the user connects, once cached lookups take less time.

I can see the  NTML session set up in frame 136 with multiple ldap requests to build the user token. Taking ~10 seconds ending on frame 869


Frame                                           Time                                                                                     
136                         2015-06-17 21:46:00.855924         IP's Removed     642        34163    275         SMB2    Session Setup Request, NTLMSSP_AUTH, User:


Packets in here are, individual calls to look up "Member of" for the user.


869                         2015-06-17 21:46:10.945434         IP's Removed    139         2119       549         SMB2    Session Setup Response

This is the standard process for NTML on initial session setup as the token is not cached, there for it needs to be built.


Pending configurations for Kerberos functionality, using SmartConnect zone name, authentication will default to Kerberos, which is much faster for session setups as the request for all the info is done in one call and built into a token.
\\ \

Latency seems to be from the multiple ldap calls to DC's and back performing lookups of group (Member of).

August 2nd, 2015 17:00

Yes, but unfortunately we are facing same issues despite of we using Smart Connect Zone name.

Every first access takes around 10 - 16 seconds, then if we do second access within several minutes, it shows immediately. But after around 15 minutes we access it again, it takes 10-16 seconds again.

Presumably, Windows or Isilon always clear that session periodically, once we access to Isilon, it sends query to DCs to get an ACL info. I guess it takes a lot of seconds due to we have so many objects on AD. So I think if can Isilon's cache size and ACL remaining period be expanded or can extend disconnection time of between Windows and Isilon.

Please let me know your thoughts.

57 Posts

August 2nd, 2015 18:00

Your best bet Is to see where the latency is and this diagram shows that

domain controller authentication happens first

http://blogs.technet.com/blogfiles/askds/WindowsLiveWriter/KerberosfortheBusyAdmin_8472/clip_image002_2.gif

If you wireshark the mount request you should able to see where the 16

seconds is spent based on the above diagram as a reference

Andrew

On Sun, Aug 2, 2015 at 8:43 PM Isilon Lover

104 Posts

August 3rd, 2015 10:00

Isilon Lover,

I'm only able to make assumptions and predictions on the data I currently have in front of me. In the pcap I do not see any authentication attempts via Kerberos, despite using smartconnect. So there may be some configuration changes needed.

However being that the packet capture and SR (72138390), are going on 2 months old.

I would really like to have you work with your vendor to open another SR to have this troubleshot more in depth. As a fresh set of pcaps would be helpful. If you are able to post the new SR#, I'll make sure to get proper eyes on this, as well as keep an eye on it myself.

You can reference this thread, to your vendor and in the new SR.

No Events found!

Top