ECS: svc_tools are showing ERROR: "Unexpected type 'exceptions.RuntimeError' error while accessing Unknown."
Summary: This ERROR is displayed in various svc_tools like "svc_gc", "svc_replicate", "svc_rg", "svc_task", "svc_vdc" after a successful node evacuation "Unexpected type 'exceptions.RuntimeError' error while accessing Unknown. Error was: DT search redirected to an unknown node, with IP 169.254.2.1" ...
Symptoms
This ERROR is displayed in various svc_tools like "svc_gc", "svc_replicate", "svc_rg", "svc_task", "svc_vdc" after a successful node evacuation.
"Unexpected type 'exceptions.RuntimeError' error while accessing Unknown. Error was: DT search redirected to an unknown node, with IP 169.254.2.1"
admin@ecsnode3:~> svc_gc config list
svc_gc v3.7.0 (svc_tools v2.20.0) Started 2024-12-11 09:21:45
Local node ECS Object Version: 3.8.1.0-140092.463d649a0d3.0.1.bugfix_release_ecs_3_8_1_0_GA_Customer (3.8.1.0 Isolated Patch 0.1 (Customer))
ERROR Unexpected <type 'exceptions.RuntimeError'> error while accessing Unknown. Error was: DT search redirected to an unknown node, with IP 169.254.2.1
ERROR Unexpected <type 'exceptions.RuntimeError'> error while accessing Unknown. Error was: DT search redirected to an unknown node, with IP 169.254.2.1
WARNING Failed to discover VDC info from dtquery, falling back to REST. Error was: dtQueryCmdFailure - Unexpected <type 'exceptions.RuntimeError'> error while accessing Unknown. Error was: DT search redirected to an unknown node, with IP 169.254.2.1
Local VDC: urn:storageos:VirtualDataCenterData:12345678-abcd-1212-3434-abcde123456 vdc_ecs_01
ERROR Unexpected <type 'exceptions.RuntimeError'> error while accessing Unknown. Error was: DT search redirected to an unknown node, with IP 169.254.2.1
ERROR Unexpected <type 'exceptions.RuntimeError'> error while accessing Unknown. Error was: DT search redirected to an unknown node, with IP 169.254.2.1
Current Param values:
Repo com.emc.ecs.chunk.gc.repo.enabled true true
Repo com.emc.ecs.chunk.gc.repo.verification.enabled true true
Repo com.emc.ecs.chunk.gc.repo.reclaimer.no_recycle_window 78 hours 78 hours
BTREE_L1 com.emc.ecs.chunk.gc.btree.enabled true true
BTREE_L1 com.emc.ecs.chunk.gc.btree.scanner.verification.enabled true true
BTREE_L1 com.emc.ecs.chunk.gc.btree.scanner.copy.enabled true true
BTREE_L1 com.emc.ecs.chunk.gc.btree.occupancyScanner.enabled true true
BTREE_L2 com.emc.ecs.chunk.gc.btree.reclaimer.level2.enabled true true
BTREE_L2 com.emc.ecs.chunk.gc.btree.occupancyScanner.level2.enabled true true
BTREE_L2 com.emc.ecs.chunk.gc.btree.scanner.level2.copy.enabled true true
BTREE_L2 com.emc.ecs.chunk.gc.btree.scanner.level2.verification.enabled true true
Partial com.emc.ecs.chunk.gc.repo.partial.enabled true true
Partial com.emc.ecs.chunk.gc.repo.partial.merge_chunk_threshold 89478400 89478400
Partial com.emc.ecs.chunk.gc.repo.partial.merge_old_chunk_threshold 89478400 89478400
Journal com.emc.ecs.chunk.gc.journal.enabled true true
Journal com.emc.ecs.prtable.gc.enabled true true
Journal com.emc.ecs.prtable.gc.record_expiration 14 days 14 days
Journal com.emc.ecs.chunk.gc.journal.protection_period 14 days 14 days
CAS com.emc.ecs.objectgc.cas.enabled true true
CAS com.emc.ecs.objectgc.cas.process_update.enabled true true
CAS com.emc.ecs.objectgc.cas.process_object.enabled true true
CAS com.emc.ecs.objectgc.cas.process_audit.enabled true true
CAS com.emc.ecs.objectgc.cas.consistency_scanner.enabled true true
CAS com.emc.ecs.objectgc.cas.process_object.dry_run false false
====> List of the Parameters Not Default:
Type Param (com.emc.ecs) Default Configure(active) MTime Reason Description
---------------------------------------------------------------------------------------
< No result data >
admin@ecsnode3:~>
Cause
For example a node evacuation of rack 2 what included nodes R2N1 - R2N5 (private.4 IP's IP 169.254.2.1 - IP 169.254.2.5) was done recently.
Service dtquery held these private.4 IPs in cache.
This also applies for evacuations of other racks.
Resolution
Check if a recent node evacuation took place. Check the latest Service Console logs across all nodes and look for "run_Node_Evacuation"
Command: svc_exec "ls -altrd /opt/emc/caspian/service-console/log/*"
Example:
admin@ecsnode3:~> svc_exec "ls -altrd /opt/emc/caspian/service-console/log/*"
svc_exec v1.0.8 (svc_tools v2.20.0) Started 2024-12-11 13:05:08
Output from node: r1n1 retval: 0
drwxr-xr-x 2 root root 122 Apr 6 2018 /opt/emc/caspian/service-console/log/runClusterConfig_20180406_070816_0
drwxr-xr-x 2 root root 122 Apr 6 2018 /opt/emc/caspian/service-console/log/runHealthCheck_20180406_070956_0
<...>
drwxr-xr-x 2 root root 78 Nov 26 15:52 /opt/emc/caspian/service-console/log/20241126_153841_run_Node_Evacuation
drwxr-xr-x 2 root root 78 Nov 26 16:11 /opt/emc/caspian/service-console/log/20241126_160456_run_Node_Evacuation
drwxr-xr-x 2 root root 78 Nov 26 22:10 /opt/emc/caspian/service-console/log/20241126_212407_run_Node_Evacuation
drwxr-xr-x 2 root root 78 Nov 27 11:20 /opt/emc/caspian/service-console/log/20241127_095641_run_Node_Evacuation <--------------------
drwxr-xr-x 2 root root 78 Nov 27 11:36 /opt/emc/caspian/service-console/log/20241127_113633_run_Node_Evacuation
drwxr-xr-x 2 root root 78 Nov 27 11:43 /opt/emc/caspian/service-console/log/20241127_114041_run_Cluster_Config
drwxr-xr-x 2 root root 78 Dec 4 09:42 /opt/emc/caspian/service-console/log/20241204_094234_run_Node_Maintenance_Enter
<...>
drwxr-xr-x 2 root root 78 Dec 11 08:20 /opt/emc/caspian/service-console/log/20241211_081946_run_Node_Maintenance_List
Output from node: r1n2 retval: 0
drwxr-xr-x 2 root root 139 Nov 25 2018 /opt/emc/caspian/service-console/log/20181125_091403_run_OS_and_Node_Upgrade
drwxr-xr-x 2 root root 139 Feb 23 2019 /opt/emc/caspian/service-console/log/20190223_084218_run_OS_and_Node_Upgrade
drwxr-xr-x 2 root root 139 Nov 23 2019 /opt/emc/caspian/service-console/log/20191123_105303_run_Upgrade
drwxr-xr-x 2 root root 78 Feb 7 2021 /opt/emc/caspian/service-console/log/20210207_122312_run_Upgrade_To_35
<...>
Output from node: r1n3 retval: 2
ls: cannot access '/opt/emc/caspian/service-console/log/*': No such file or directory
Output from node: r1n4 retval: 2
ls: cannot access '/opt/emc/caspian/service-console/log/*': No such file or directory
Output from node: r1n5 retval: 0
drwxr-xr-x 2 root root 78 Dec 22 2023 /opt/emc/caspian/service-console/log/20231222_094024_run_Cluster_Config
drwxr-xr-x 2 root root 62 Dec 22 2023 /opt/emc/caspian/service-console/log/20231222_094644_run_Node_Maintenance_Enter
drwxr-xr-x 2 root root 78 Dec 22 2023 /opt/emc/caspian/service-console/log/20231222_094716_run_Node_Maintenance_Enter
Output from node: r1n6 retval: 2
ls: cannot access '/opt/emc/caspian/service-console/log/*': No such file or directory
Output from node: r1n7 retval: 2
ls: cannot access '/opt/emc/caspian/service-console/log/*': No such file or directory
Output from node: r1n8 retval: 2
ls: cannot access '/opt/emc/caspian/service-console/log/*': No such file or directory
Output from node: r3n1 retval: 2
ls: cannot access '/opt/emc/caspian/service-console/log/*': No such file or directory
Output from node: r3n2 retval: 2
ls: cannot access '/opt/emc/caspian/service-console/log/*': No such file or directory
Output from node: r3n3 retval: 2
ls: cannot access '/opt/emc/caspian/service-console/log/*': No such file or directory
Output from node: r3n4 retval: 2
ls: cannot access '/opt/emc/caspian/service-console/log/*': No such file or directory
Output from node: r3n5 retval: 2
ls: cannot access '/opt/emc/caspian/service-console/log/*': No such file or directory
Output from node: r3n6 retval: 2
ls: cannot access '/opt/emc/caspian/service-console/log/*': No such file or directory
Output from node: r3n7 retval: 2
ls: cannot access '/opt/emc/caspian/service-console/log/*': No such file or directory
Output from node: r3n8 retval: 2
ls: cannot access '/opt/emc/caspian/service-console/log/*': No such file or directory
admin@ecsnode3:~>
Confirm that the rack is removed. In this example rack 2 with IP 169.254.2.1 is not shown:
admin@ecsnode3:~> getclusterinfo
Registered Racks
================
Ip Address epoxy seg mac seg color seg id NAN Hostname
=============== ===== ================= ========== ======= ============
169.254.1.1 False 28:99:3a:12:34:56 red 1 provo-red.nanlocal
169.254.3.1 False 28:99:3a:78:90:12 blue 3 provo-blue.nanlocal
admin@ecsnode3:~>
Each time the error is returned from a svc_tools command, the error is shown in dtquery.log of the node where the command was issued:
admin@ecsnode3:~> svc_log -f "169.254.2.1" -sr all -start 1m -sn -sf
svc_log v1.0.33 (svc_tools v2.20.0) Started 2024-12-11 11:18:29
Running on nodes: <All nodes>
Time range: 2024-12-11 11:17:30 - 2024-12-11 11:18:30
Filter string(s): '169.254.2.1'
Service(s) to search: zk-fabric,cm,am,metering,resourcesvc,nvmeengine,casaccess,vnest,upgrade,ecsportalsvc,blobsvc,coordinatorsvc,authsvc,atlas,rm,eventsvc,dataheadsvc,stat,dm,accesslog,metering-georeplayer,zk-object,objcontrolsvc,ssm,nginx,nvmetargetviewer,messages-object,dtsm,provisionsvc,lifecycle,dataheadsvc-access,georeceiver,sr,transformsvc,dtquery,storageserver,datahead-cas-access
Show filename(s): True
Show nodename(s): True
Log type(s) to search for each service: <Main Logs>
Show nodename(s): True
169.254.1.1 dtquery.log 2024-12-11T11:18:05,800 [qtp877323851-243362] INFO DtQueryService.java (line 3749) redirecting to http://169.254.2.1:9101/urn:storageos:OwnershipInfo:3a6bc46a-8551-4df9-a140-5e6b9774f2cb__RT_58_128_0:/REP_GROUP_KEY/?maxkeys=1000&showvalue=gpb&rgId=urn%3Astorageos%3AReplicationGroupInfo%3A00000000-0000-0000-0000-000000000000%3Aglobal
169.254.1.1 dtquery.log 2024-12-11T11:18:05,804 [qtp877323851-243014] INFO DtQueryService.java (line 3749) redirecting to http://169.254.2.1:9101/urn:storageos:OwnershipInfo:3a6bc46a-8551-4df9-a140-5e6b9774f2cb__RT_114_128_0:/REP_GROUP_KEY/?maxkeys=1000&showvalue=gpb&rgId=urn%3Astorageos%3AReplicationGroupInfo%3A00000000-0000-0000-0000-000000000000%3Aglobal
admin@ecsnode3:~>
The timestamp of the last restart of dtquery is before the last recent node evacuation.
Open a Service Request with Dell Technical support and mention this KBA 000259052 for getting the dtquery service restarted.
A restart of dtquery is not impacting any frontend I/O's as it is only used internally by ECS.