Isilon: Gateway Connectivity Status shows Disconnected, Internal Error 401 Unauthorized response codes
Summary: Gateway Connectivity Status shows Disconnected, and Internal Error 401 Unauthorized response codes are seen (User Correctable).
Symptoms
The cluster Gateway Connectivity Status, which was working and connected previously, shows Disconnected.401 Unauthorized responses are seen from the Secure Connect Gateway that is NOT related to invalid username and password.
On Isilon clusters using Secure Connect Gateway, isi esrs view shows that the Gateway Connectivity Status is disconnected. The configuration was previously working correctly.
Attempts to disable and reenable isi_esrs_d and isi_rsapi_d do not help restore connectivity.
Attempts to modify the configuration using a username and attempts to disable Secure Connect Gateway (--enabled=false) in the hopes of re-creating the configuration fail.
A similar error is also seen when issuing the isi esrs modify --enabled==false command:
Internal error:
ESRSBadCodeError: ESRS Delete Device failed. Error response code from gateway: 401, Unauthorized. Response dictionary was: {{'http_response_code': 401, 'curl_trace': "* Trying 1.2.3.4:9443...\n* TCP_NODELAY set\n* Name '1.2.3.4' family 2 resolved to '1.2.3.4' family 2\n* Local port: 0\n* Connected to 1.2.3.4 (1.2.3.4) port 9443 (#0)\n* ALPN, offering http/1.1\n* Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH\n* SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384\n* ALPN, server accepted to use http/1.1\n* Server certificate:\n* subject: C=US; ST=MA; L=SO; O=EMC; OU=ESRS; CN=myname\n* start date: Dec 4 12:20:29 2019 GMT\n* expire date: Dec 4 12:20:29 2039 GMT\n* issuer: C=US; ST=MA; L=SO; O=EMC; OU=ESRS; CN=myname\n* SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.\n> DELETE /esrs/v1/devices/ISILON-GW/ELMISLxxxxxxxx HTTP/1.1\r\nHost: 1.2.3.4:9443\r\nUser-Agent: PycURL/7.43.0 libcurl/7.66.0 OpenSSL/1.0.2r-fips zlib/1.2.11\r\nAccept: */*\r\nAuthorization: ISILON-GW:ELMISLxxxxxxxx:keykeykeyGYskKOwNZQg7SGOgC8=,domain=Device\r\nDate: Wed, 04 Nov 2020 19:15:47 GMT\r\n\r\n* Mark bundle as not supporting multiuse\n< HTTP/1.1 401 Unauthorized\r\n< Date: Wed, 04 Nov 2020 19:15:48 GMT\r\n< Server: Apache PivotalWebServer\r\n< Strict-Transport-Security: max-age=31536000; includeSubDomains;\r\n< Content-Security-Policy: frame-ancestors 'self'\r\n< Transfer-Encoding: chunked\r\n< \r\n* Connection #0 to host 1.2.3.4 left intact\n"}}
If the 401 message contains the text string Invalid username and password, this article does NOT apply.
Cause
In the response above, notice that the Isilon issues an
HTTP DELETE command:
DELETE /esrs/v1/devices/ISILON-GW/ELMISLxxxxxxxx
The device key is used to authenticate this communication between Isilon and the Secure Connect Gateway:
Authorization: ISILON-GW:ELMISLxxxxxxxx:keykeykeyGYskKOwNZQg7SGOgC8=
If the gateway was reinstalled, or has lost the device key for another reason, REST-API authentications using the key fails with a
401 Unauthorized error.
Sometimes if it is not possible to determine the state of the Secure Connect Gateway, Dell Support can be engaged to investigate. Device keys are lost if a gateway is reinstalled, but can also be lost if the gateway is rolled back to an old snapshot (before the key was created or updated). In addition, if a Secure Connect Gateway runs out of disk space and a new device is added during that time, the database of device keys on the Secure Connect Gateway cannot be extended.
Resolution
The simplest way and most likely solution would be to run the following commands to disable the configuration and follow that with a new registration, to generate a new device key for the Isilon. The --force flag instructs Isilon to disable Secure Connect Gateway services without trying to contact the gateway to deregister first.
isi esrs modify --enabled=false --force isi esrs modify --enabled=true --username=DellSupportEmail --password=MYPASS
If these commands do not work, Dell Support can be engaged to manually clear the key and reset the communication between the Secure Connect Gateway and the Isilon cluster. Open a service request quoting this article.
An experienced administrator must perform the following commands to disable the relevant services on the cluster, to manually disable Secure Connect Gateway, and to clear the existing authentication key on the cluster side.
- Disable the relevant services, if they are running. These services represent daemons that run the Secure Connect Gateway service and the REST-API service for Secure Connect Gateway connectivity. Disabling these services does not affect client access to the cluster.
isi services -a isi_esrs_d disable isi services -a isi_rsapi_d disable
- Manually force the disable of Secure Connect Gateway.
isi_gconfig -t esrs esrs.enabled=false
- Clear the key in use for communication.
isi_gconfig -t esrs esrs.esrs_api_key=""
NOTE: if secondary Gateway is configured, clear the API key for secondary Gateway using command below:
isi_gconfig -t esrs esrs.secondary_esrs_api_key=""
- Enable ONLY the
rsapiservice.
isi services -a isi_rsapi_d enable
- Use the normal commands to enable the Secure Connect Gateway configuration.
isi esrs modify --enabled=true --username=DELL-Support-Account --password=MYPASS
- If the command times out waiting for a REST response, enter the following command:
isi --timeout=600 esrs modify --enabled=true --username=DELL-Support-Account --password=MYPASS
- If the command is still showing errors, start a new gather on the Isilon and engage Dell Support.
These steps are used to re-create and reestablish the communication between the Isilon and the Secure Connect Gateway.
Additional Information
If the primary node fails, the cluster elects a new primary and the landing IP is updated to the Secure Connect Gateway using the REST protocol. Both the "isi_esrs_d" and the "isi_rsapi_d" services should be running on all the nodes in the cluster. The primary is elected based on the nodes that are configured into the Gateway Access Pools.
The primary is spotted by reviewing the logs, or issuing the command below:
sqlite3 /ifs/.ifsvar/modules/tardis/namespaces/esrs.registered.state.sqlite 'select value from kv_table where key == "esrs_registered_lnn";' 5
In the example above, the primary node is node 5. Upon inspection of the IP address from the Gateway Access Pool for this node, find this as the IP address in use for the ELM* cluster license on the Secure Connect Gateway. Dell Support personnel also see this IP address in the ServiceLink Secure Connect Gateway systems in use to dial in to clusters.