Isilon HDFS: "STATUS_IO_TIMEOUT" while performing 'hdfs ls' operation on compute node
Summary: 'hdfs ls' operation from a compute node on the Isilon file system might fail intermittently with error "ls: Error creating security context for user oozie cause: STATUS_IO_TIMEOUT: status: STATUS_IO_TIMEOUT = 0xC00000B5" ...
Symptoms
When performing a 'hdfs ls' operation from a compute node on the Isilon file system, different users will intermittently fail with the below error:
[oozie@hdp ~]$ hdfs dfs -ls /
ls: Error creating security context for user oozie cause: STATUS_IO_TIMEOUT: status: STATUS_IO_TIMEOUT = 0xC00000B5
You will see related errors on Isilon hdfs,log for the same operation:
java.io.IOException cause: Error creating security context for user <xxxx/yyyy@zzz.com> cause: STATUS_IO_TIMEOUT: status: STATUS_IO_TIMEOUT = 0xC00000B5
Cause
This can be caused due to two reasons:
1) A faulty DNS entry on the Isilon. We will not be able to query the Smartconnect zone name against these associated groupnet DNS servers using 'nslookup' and 'dig'.
1. To list the DNS servers configured on the Isilon:
# isi network groupnets list
2. To check if we are able to query the configured FQDN on the HDFS server with the DNS servers present on the Isilon:
# nslookup <FQDN> <DNS server>
# dig @<dns server IP> <FQDN>
2) Domain connectivity issues between the Isilon and the associated domain used in the access zone. Both primary and trusted domains must be able to communicate with the Isilon without any issues. You can check for any domains which can't be resolved by examining messages in /var/log/lsassd.log,
Resolution
# isi network groupnets modify <groupnet ID> --remove-dns-servers=<Faulty DNS server IP>
If the primary or any of the associated trusted domains are not reachable or in offline status, we need to troubleshoot further to dig the underlying
issue.
The troubleshooting guide for domain offline is a good place to start: http://www.emc.com/collateral/TechnicalDocument/docu63151.pdf