Unsolved
This post is more than 5 years old
5 Posts
1
5536
Issue with kerberization on Isilon integrated environment
Issue with kerberization on Isilon integrated with BigInsights environment.
Been trying to work with BigInsights environment integrated with Isilon file system.I tried to kerberize the cluster using amabri security wizard and by manually creating the principals for the Isilon host. We were not able to access the file system using HDFS protocol after kerberization. We followed the steps mentioned for automated ambari kerberization with EMC Isilon and made all the required changes for principals of various services. Here are the configuration of principals for various users and services.
User principal names
- Smokeuser Principal Name:
${cluster-env/smokeuser}-${cluster_name}@${realm}
=>${cluster-env/smokeuser}@${realm}
- spark.history.kerberos.principal:
${spark-env/spark_user}-${cluster_name}@${realm}
=>${spark-env/spark_user}-@${realm}
- HBase user principal:
${hbase-env/hbase_user}-${cluster_name}@${realm}
=>${hbase-env/hbase_user}@${realm}
- HDFS user principal:
${hadoop-env/hdfs_user}-${cluster_name}@${realm}
=>${hadoop-env/hdfs_user}@${realm}
Service principal names
- HDFS -> dfs.namenode.kerberos.principal:
nn/_HOST@${realm}
=>hdfs/_HOST@${realm}
- YARN -> yarn.resourcemanager.principal:
rm/_HOST@${realm}
=>yarn/_HOST@${realm}
- YARN -> yarn.nodemanager.principal:
nm/_HOST@${realm}
=>yarn/_HOST@${realm}
- MapReduce2 -> mapreduce.jobhistory.principal:
jhs/_HOST@${realm}
=>mapred/_HOST@${realm}
- Falcon -> *.dfs.namenode.kerberos.principal:
nn/_HOST@${realm}
=>hdfs/_HOST@${realm}
I added the realm to the Isilon node and configured our zone to use krb auth provider. I then created the service principal names for hdfs and spnego(HTTP) for Isilon smartconnect hostname.
After enabling kerberos using ambari security wizard, I changed the authentication mode for HDFS as kerberos only on Isilon dashboard.
When I tried to access the file system after this, I found out the my HDFS user principal was not able to connect to namenode server.
Here is the error that I am getting:
[hdfs@bigaperf638 root]$ hdfs dfs -ls /
16/08/03 22:36:07 WARN security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
16/08/03 22:36:09 WARN security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
16/08/03 22:36:11 WARN security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
16/08/03 22:36:14 WARN security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
16/08/03 22:36:17 WARN ipc.Client: Couldn't setup connection for hdfs@IBM.COM to subnet1-pool1.svl.ibm.com/9.30.149.127:8020
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: Defective token detected (Mechanism level: AP_REP token id does not match!)]
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
at org.apache.hadoop.security.SaslRpcClient.saslEvaluateToken(SaslRpcClient.java:483)
at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:427)
at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:560)
at org.apache.hadoop.ipc.Client$Connection.access$1900(Client.java:375)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:729)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:725)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:725)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
at org.apache.hadoop.ipc.Client.call(Client.java:1451)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2108)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
at org.apache.hadoop.fs.Globber.glob(Globber.java:265)
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1655)
at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:326)
at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:235)
at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:218)
at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:201)
at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)
Caused by: GSSException: Defective token detected (Mechanism level: AP_REP token id does not match!)
at sun.security.jgss.krb5.AcceptSecContextToken.
at sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:755)
at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:248)
at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192)
... 41 more
ls: Failed on local exception: java.io.IOException: Couldn't setup connection for "HDFS USER PRINCIPAL" to 9.30.149.127:8020; Host Details : local host is: "9.30.149.8"; destination host is: "SMARTCONNECT HOSTNAME":8020;
Please note that I have valid tgts cached for hdfs user.
Kindly advice on any configuration changes that I have missed.
russ_stevenson
17 Posts
0
August 26th, 2016 14:00
Does the access zone that this hdfs access occurring in have two kerberos providers? Basically are an AD provider and a krb5 Provider both in that zone.
I just saw this same error with this configuration, Isilon maybe be trying to auth against two kerberos providers in the same zone.
gowrishankarts
5 Posts
0
August 28th, 2016 22:00
No. I just have one krb5 provider and a local provider configured for that auth zone. The output of the auth status command (isi auth status --zone ) shows one krb5 provider as online and a local provider as active.
russ_stevenson
17 Posts
0
August 29th, 2016 10:00
Troubleshooting krb issues can be tricky, I would focus on DNS & SPN's initially.
- Are all the relevant DNS & reverse DNS entries present for the SmartConnect/NameNode name the Hadoop cluster is connecting to.
-Do you have SPN's for hdfs/smartconnectFQDM@REALM.COM & HTTP/smartconnectFQDM@REALM.COM
On the Isilon can you kinit as a user principal in your KDC successfully.
Did you follow the blog posts on Kerberizing Isilon and Hadoop as found here. Other post here maybe helpful also.
Hadoop - Isilon Info Hub
gowrishankarts
5 Posts
0
August 30th, 2016 22:00
I was able to resolve this issue by cleaning up all stale configurations of kerberos on Isilon and my KDC server and enabling kerberos using the steps given in the ink below in the same order.
Ambari Automated Kerberos Configuration with EMC Isilon
Thanks for your help!!
gowrishankarts
5 Posts
0
September 13th, 2016 10:00
Hi Russ,
I was able to enable kerberos successfully as I mentioned in my previous message. I am still facing an issue with oozie service check.
Oozie service check is failing with a JA009 error code.
Here is what I observed:
When we start the service check it runs a smoke test script which uses a workflow.xml on HDFS and tries to launch a mapreduce application. When the mapreduce application starts the containers that resource manager launches exit with an error code -1000.. It says "Client cannot authenticate via:[TOKEN, KERBEROS]" and fails the map reduce application after 2 attempts..
But the oozie job info command does not realise that the map reduce job has failed. Instead it waits for a success report and times out after 300 secs..
We do not find any records in job history ui suggesting that the map reduce job does not actually start.. The client tries to launch it but fails to get itself authenticated via kerberos..
We tried other mapreduce applications such as word count, QuasiMonteCarlo jobs as oozie , yarn and ambari-qa users and we had no issues . Even yarn jobs like distributed shell are passing.
When the diagnostics say "client cannot authenticate" I am not sure which client it is referring too..
Here is the error observed in yarn resource manager logs:
Diagnostics: Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "9.30.149.9"; destination host is: "SMARTCONNECT HOSTNAME":8020;
java.io.IOException: Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "9.30.149.9"; destination host is: "SMARTCONNECT HOSTNAME":8020;
Please note that I have valid tgts cached for yarn, mapred, hdfs and oozie users and I have created oozie proxy user on Isilon for my zone and added ambari-qa user.
Please let me know if I am missing something.
Thanks for your help in advance.
russ_stevenson
17 Posts
0
September 15th, 2016 08:00
Increase the hdfs logging on the Isilon and then find the corresponding onefs error message in the log on Isilon.
This will show us which principal is attempting to authenticate against Isilon and provide additional information to help us diagnose where the issue lives. Since isilon is a clustered OS, you need to check all log files on all nodes to find the relevant entries; each node has it own hdfs log at /var/log/hdfs.log
russ
gowrishankarts
5 Posts
0
September 15th, 2016 09:00
Thanks for your help!!
I did increase the log level to debug on Isilon . But when we ran the oozie service check on ambari ui we were not getting any records on the isilon nodes' /var/log/hdfs.log files.I tried to tail these files and I was not getting any records during the execution of service check.
When I tried enabling app timeline service on yarn, I was getting a different error. It throws the following error in the resource manager logs:
Failed to renew token: Kind: TIMELINE_DELEGATION_TOKEN, Service: 9.30.149.9:8188, Ident: (owner=ambari-qa, renewer=yarn, realUser=oozie, issueDate=1473911820254, maxDate=1474516620254, sequenceNumber=278, masterKeyId=107)
I found out that this has an issue with yarn and has been fixed with yarn 2.7.0.
This is the issue that I found on JIRA related to this
[YARN-2425] When Application submitted by via Yarn RM WS, log aggregation does not happens - ASF JIRA
I have createed oozie proxy user and added ambari-qa, hdfs and yarn users to it.
Please let me know if there is anyway I can get to the bottom of this issue.
Gowri Shankar
russ_stevenson
17 Posts
0
September 1st, 2017 11:00
Recently dealt with this issue again, It looks like the issue is related to mismatches in the keytab versions between Isilon & KDC.
I would look at starting on page 27 of the following doc:
http://www.emc.com/collateral/TechnicalDocument/docu83576.pdf
dineshaj
8 Posts
0
September 6th, 2017 07:00
Hi Russ,
I am also facing the same issue after kerberisation of HDP cluster (2.5) on Isilon 8.0.1.0 my oozie jobs are going in Suspended state
JA009: Failed on local exception:
DNS entries are fine
I have also added the ambari-qa, yarn and hdfs in oozie proxy users list
Please let me know if anything i am missing here
BernieC
76 Posts
0
September 7th, 2017 09:00
If you take the JA009 error from oozie and look for something getting logged on Isilon at the same time, are you seeing anything there? You may need to increase HDFS service debug logging to see more descriptive errors on the Isilon as well (isi hdfs log-level modify --set=debug; isi hdfs log-level modify --set=default to revert). I'm curious to see what you're seeing there and in the oozie logs. Please share whatever you have.