PowerScale: Isilon: Audit reports STATUS_TIMEOUT errors

Summary: If you receive STATUS_TIMEOUT events in CELOG alerting from one or more CEE servers, this KB is designed to assist in understanding the events and help identify the cause.

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Instructions

Isilon CEE will report a STATUS_TIMEOUT error if it gets a response back from a heartbeat request outside of a 3 second window.  This is a connectivity test between the Isilon cluster and the CEE server.  The delays for response can be caused by many possible causes here we will discuss a few that we have seen. 
 
1. Over driven CEE Server causing starvation of resources on the CEE server
  • could cause a delay in processing the response or be over driving the CEE server not allowing it to provide a response in the correct amount of time. 
2. DNS lookup takes too long
  •  causes the full TCP packet stream to last too long
3. Network delays

Additional Information

This content is translated in 14 languages: 
https://downloads.dell.com/TranslatedPDF/ES_KB540170.pdf
https://downloads.dell.com/TranslatedPDF/ES-XL_KB540170.pdf
https://downloads.dell.com/TranslatedPDF/FR_KB540170.pdf
https://downloads.dell.com/TranslatedPDF/IT_KB540170.pdf
https://downloads.dell.com/TranslatedPDF/JA_KB540170.pdf
https://downloads.dell.com/TranslatedPDF/KO_KB540170.pdf
https://downloads.dell.com/TranslatedPDF/NL_KB540170.pdf
https://downloads.dell.com/TranslatedPDF/PT_KB540170.pdf
https://downloads.dell.com/TranslatedPDF/PT-BR_KB540170.pdf
https://downloads.dell.com/TranslatedPDF/RU_KB540170.pdf
https://downloads.dell.com/TranslatedPDF/ZH-CN_KB540170.pdf
https://downloads.dell.com/TranslatedPDF/ZH-TW_KB540170.pdf
https://downloads.dell.com/TranslatedPDF/AR_KB540170.pdf
https://downloads.dell.com/TranslatedPDF/DE_KB540170.pdf


To review for Overdriven CEE server we can look at the Task Manager on that server and see what the CPU, Memory, and Network throughput look like.  If we see any of these over 90% then we may want to look at either adding more CEE servers to spread the Audit load, or increasing the resources that the CEE server is allowed.  Administration guides advise that you maintain a 1:1 ratio of CEE servers to auditing nodes. 

We also want to review the current backlog of auditing and the current export rate using the below two commands:
# isi_for_array -sX 'isi_audit_progress -t protocol CEE_FWD'
# isi statistics query current --nodes=all --keys=node.audit.cee.export.rate

To avoid DNS lookup issues I advise to configure your CEE servers by IP instead of DNS name.  This eliminates DNS lookup and allows faster (marginally) audit performance with or without seeing the STATUS_TIMEOUT events.  


Network Delays can come up from many issues in the network configuration, a CEE server off site or on a different network with many hops can cause these requests to take longer than 3 seconds to complete. 
Article Properties
Article Number: 000158349
Article Type: How To
Last Modified: 28 Oct 2022
Version:  3
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.