Here's my environment, I hope someone kindly help me out to reslove folowing issue.
Version: OneFS 188.8.131.52
Always every user's first access is delaying around 5 - 10 seconds.
I have investigated this issue and found that looks like our AD objects total is larger than the Isilon's max cache size.
- I have already set max cache size as 47.68M. It can be obtained by "isi zone zones view System".
Unfortunately, seems our AD objects size it too much to get it cached on the Isilon.
We have around 9,000 users and 96,000 of security groups.
- We got following error in "/var/log/lsassd.log". Is 47.68M the limitation of this box?
2015-07-16T23:32:51+09:00 <30.4> HOSTNAME1(id2) lsass: [lsass] The
current cache size (50147202) is larger than the cap (50000000) - evicting old objects
I would appreciate it if someone could provide me with hint or the answer.
- Can this limit be extended?
- Or is there any workaround that we can try to avoid this slowness? (I hope 9,000 users and 96,000 groups doesn't impact to the Isilon.)
- Is there any action that we can do to Active Directory?
Thanks in advance.
Do you know if you are using NTLM or kerberos?
From what I can gather from the above information it sounds like you are using NTLM for authentication based on the fact that it's the first time authenticating were we are seeing latency.
A few things to try to confirm:
If you are using a static IP to access shares Eg: \\IP_of_node\share
This is using NTLM for authentication
If smart connect zone is being used Eg: \\smartconnect_zone_name\share we can check if this is using a c-name (canonical name) or spn (service principle name) by running the following command on a windows client:
replace <smartconnect> with the smart connect zone name:
klist |findstr "<smartconnect>"
Hi Isilon Lover,
There is a way to increase cache size which, based on a few service requests I researched, is the preferred fix. However, it depends on how many access zones you have. Could you please let me know how many you have and I can put together some steps for you to take?
I have already had NTLM checkbox checked. I will check the response speed of when we type ipaddress/share.
Also, ususlly we use smartconnect or CNAME, then I have alreday checked all of them are regstered on SPN.
I confirmed I have that Kerberos ticket by using the command you have kindly told me.
However, the problem is the latency of when we access via SmartConnect or CNAME.....
Thanks for your advise.
This is going to need a little more in-depth analysis, we are going to need to capture the latency with in a pcap (packet capture) to catch the cause.
I would like to request you open a service request to have this further analyzed.
If you could post the SR (Service Request) number in the forum, I can make sure we can get a TSE (Technical Support Engineer) available to assist with this.
To create a service request, you have a couple options:
1. Log in to your online account on support.emc.com and go to this page: https://support.emc.com/servicecenter/createSR
2. Call in to EMC Isilon Support at 1-800-782-4362 (For a complete local country dial list, please see this document: http://www.emc.com/collateral/contact-us/h4165-csc-phonelist-ho.pdf)
Unfortunately, we don't have EMC direct warranty, local vendor provides their support on behalf of EMC. But I know they have already raised a certain request to you regarding this. Maybe do I retrieve local vendor raised number and let you know?
Sorry for the delayed response, Yes. if you can get me the EMC SR (Service request) number from your vendors support I can look at where the SR is at on our end. Then I can offer up some help in the SR to move this forward to resolution.
I checked out SR 72138390 which is currently closed.
I was able to go through the .pcap that was supplied in the SR and found a few informative things.
You are using IP to connect to the cluster which defaults to NTLM authentication. (I would post packets from the capture but these reveal IP's of local devices)
What the .pcap is showing is requests to //<nodeIP>/<share> so a static IP is being used.
Since NTLM is being used, session set up needs to build a token for the user each (1st) time the user connects, once cached lookups take less time.
I can see the NTML session set up in frame 136 with multiple ldap requests to build the user token. Taking ~10 seconds ending on frame 869
136 2015-06-17 21:46:00.855924 IP's Removed 642 34163 275 SMB2 Session Setup Request, NTLMSSP_AUTH, User: <username removed>
Packets in here are, individual calls to look up "Member of" for the user.
869 2015-06-17 21:46:10.945434 IP's Removed 139 2119 549 SMB2 Session Setup Response
This is the standard process for NTML on initial session setup as the token is not cached, there for it needs to be built.
Pending configurations for Kerberos functionality, using SmartConnect zone name, authentication will default to Kerberos, which is much faster for session setups as the request for all the info is done in one call and built into a token.
Latency seems to be from the multiple ldap calls to DC's and back performing lookups of group (Member of).