VxRail:未顯示實體視圖,因為節點未回應「esxcli」命令

Summary: VxRail Cluster 節點實體檢視遺失,節點未回應「esxcli」命令,NTP 未同步。

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

  1. 缺少所有節點的實體檢視。
    從 web.log 開始,API 閘道在擷取實體檢視資料 10 分鐘後逾時:

    2023-11-20T09:24:31.039Z <7527c8d153655e9bbb43b32dcd312443> marvin [ERROR] <261> ApplianceServiceImpl.java populatePvCache() (276): failed to fetch data.
    javax.ws.rs.ServerErrorException: HTTP 504 Gateway Time-out
    	at org.glassfish.jersey.client.JerseyInvocation.createExceptionForFamily(JerseyInvocation.java:1125) ~[jersey-client-2.27.jar:?]
    	at org.glassfish.jersey.client.JerseyInvocation.convertToException(JerseyInvocation.java:1105) ~[jersey-client-2.27.jar:?]
    	at org.glassfish.jersey.client.JerseyInvocation.translate(JerseyInvocation.java:883) ~[jersey-client-2.27.jar:?]
    	at org.glassfish.jersey.client.JerseyInvocation.lambda$invoke$1(JerseyInvocation.java:767) ~[jersey-client-2.27.jar:?]
    	at org.glassfish.jersey.internal.Errors.process(Errors.java:316) ~[jersey-common-2.27.jar:?]
    	at org.glassfish.jersey.internal.Errors.process(Errors.java:298) ~[jersey-common-2.27.jar:?]
    	at org.glassfish.jersey.internal.Errors.process(Errors.java:229) ~[jersey-common-2.27.jar:?]
    	at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:414) ~[jersey-common-2.27.jar:?]
    	at org.glassfish.jersey.client.JerseyInvocation.invoke(JerseyInvocation.java:765) ~[jersey-client-2.27.jar:?]
    	at org.glassfish.jersey.client.JerseyInvocation$Builder.method(JerseyInvocation.java:456) ~[jersey-client-2.27.jar:?]
    	at org.glassfish.jersey.client.JerseyInvocation$Builder.post(JerseyInvocation.java:357) ~[jersey-client-2.27.jar:?]
    	at com.vce.commons.domainowner.graphq.DefaultQueryExecutorImpl.doJsonRequestExecution(DefaultQueryExecutorImpl.java:139) ~[commons-7.0.480.jar:?]
    	at com.vce.commons.domainowner.graphq.DefaultQueryExecutorImpl.execute(DefaultQueryExecutorImpl.java:94) ~[commons-7.0.480.jar:?]
    	at com.emc.mystic.manager.graphql.client.host.HostQuery.configuredHosts(HostQuery.java:138) ~[do-host-graphql-client-1.20.41.jar:?]
    	at com.emc.mystic.manager.graphql.client.host.HostQuery.configuredHosts(HostQuery.java:102) ~[do-host-graphql-client-1.20.41.jar:?]
    	at com.vce.commons.domainowner.node.NodeRepository.getAllClusterNodeData(NodeRepository.java:1997) ~[commons-7.0.480.jar:?]
    	at com.emc.mystic.manager.cluster.service.ApplianceServiceImpl.getAllHostData(ApplianceServiceImpl.java:543) ~[classes/:?]
    	at com.emc.mystic.manager.cluster.service.ApplianceServiceImpl.populatePvCache(ApplianceServiceImpl.java:260) ~[classes/:?]
    	at com.emc.mystic.manager.cluster.service.ApplianceServiceImpl.lambda$updateCacheTask$6(ApplianceServiceImpl.java:320) ~[classes/:?]
    	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
    	at java.lang.Thread.run(Thread.java:829) [?:?]
    2023-11-20T09:24:31.045Z <7527c8d153655e9bbb43b32dcd312443> marvin [INFO] <261> ApplianceServiceImpl.java lambda$updateCacheTask$6() (321): Success to refresh cache for node: <node SN>
    2023-11-20T09:24:31.046Z <7527c8d153655e9bbb43b32dcd312443> marvin [INFO] <59> ApplianceDataRoot.java refreshVxRailClusterTag() (289): Skip refreshing VxRail-Cluster-Tag triggered by scheduled job as it has already been done within 15 minutes.
    2023-11-20T09:24:31.046Z <7527c8d153655e9bbb43b32dcd312443> marvin [INFO] <59> ApplianceDataRoot.java fetchData() (357): [VXMPERF] PV fetch data execution time in 602 seconds
    
  2. 在 VxRail Manager 上執行下列命令,以查詢節點實體檢視資料:

    #curl -X GET --unix-socket /var/lib/vxrail/nginx/socket/nginx.sock http://127.0.0.1/rest/vxm/internal/do/v1/host/query -H 'Content-Type: application/json' -d '{"query":"{ configuredHosts { hardware { sn } } }"}' 2>/dev/null | jq | egrep "name|sn" | awk -F\" '/sn/{print $4}' | sort -u | while read sn; do time curl -X POST --unix-socket /var/lib/vxrail/nginx/socket/nginx.sock -H 'Content-Type: application/json' -d '{"variables":"{\"sn\":\"'${sn}'\"}","query":"query ($sn:[String]){ configuredHosts(sn:$sn) { moid name type summary{ hardware{ cpuNum}} config{ hostUUID isPrimary network{ vnic{ device ipv4 allIpv6s nonLinkLocalIpv6} idrac{ ipAddress ipAddressSource netmask gateway ipAddressV6 gatewayV6 prefixLen ipv6AutoConfig vlan{ enabled id priority}}} localSlotClaims{ slot bay type usage diskgroupId} diskgroup{ slotNum current{ type}} installedComponent{ displayName version model description installedTime}} runtime{ connectionState overallStatus powerState inMaintenanceMode} hardware{ sn psnt slot manufacturer name systemStatusLed tpm model firmware{ id model} firmwareRevisions{ idsdmFwRevision biosFwRevision bmcFwRevision diskCtrlFwRevision bossFwRevision cpldFwRevision expanderBackplane nonExpanderBackplane dcpmFwRevision percDiskCtrlFwRevision} baseline{ sn slot chassisId isMissing} chassis{ name model psnt partNumber serviceTag psus{ sn name slot manufacturer partNumber firmwareVersion baseline{ sn slot isMissing}}} disks{ sn guid capacity slot firmwareVersion diskType diskState manufacturer protocol maxCapableSpeed model ledStatus writeEndurance bay enclosure remainingWriteEnduranceRate encryptionAbility encryptionStatus baseline{ sn slot bay isMissing}} bootDevices{ sn firmwareVersion sataType powerOnHours powerCycleCount avrEraseCount maxEraseCount capacity deviceModel slot health bootDeviceType status blockSizeBytes partNumber manufacturer controllerFirmware controllerModel controllerStatus raidStatus} nics{ mac linkSpeed firmwareFamilyVersion linkStatus fqdd specificNicType wwnn wwpn drivers{ driverName driverVersion}} position{ rackName rackSlot} storageInstance{ securityStatus encryptionMode}}} }","operationName":null}' http://0/rest/vxm/internal/do/v1/host/query; echo ""; echo ""; echo "Done checking $sn"; done
  3. 結果顯示一或多個節點未在 10 分鐘內傳回資料:
    有問題的節點在正常運作的節點傳回正確的實體檢視資料時,傳回錯誤「504 閘道逾時」。
    結果顯示一或多個節點未在 10 分鐘內傳回資料

  4. 根據上述識別的節點 SN,登入節點並執行 esxcli 命令,它卡住了,但 localcli 有效:
    執行 esxcli 命令時卡住,但 localcli 可正常運作

 

Cause

節點發生 NTP 同步問題,造成 esxcli 命令停滯且無回應。

VxRail 實體檢視呼叫 esxcli 命令以擷取韌體資訊,而當節點停滯在執行狀態時,無法取得資訊 esxcli 命令。

 

Resolution

解決方案是找出所有節點上的 NTP 同步問題並加以修正。請參閱以下步驟,以確認節點上的 NTP 狀態:

  1. 檢查 /var/log/vobd.log是否有任何錯誤訊息指出系統時鐘不再與上游時間伺服器同步。

  2. 如果是的話,請檢查 NTP 伺服器狀態。

    #ntpq -p

    ntpq -p 命令範例

    如果「reach」值不是 377,則主機缺少 NTP 事務,則此 ESXi 上的時間可能不正確,需要注意。

    #ntpq -c

    ntpq-c 命令範例

    如果「層」值超出 2 到 6 的範圍,則此 ESXi 上的時間同步可能會遇到額外的延遲,可能會導致計時不準確。

  3. 使用 API 時,檢查 VxRail Manager 資料庫中的 NTP 伺服器位址是否正確。

    curl -k --user "[vCenter account]:[vCenter password]" --request GET "https://localhost/rest/vxm/v1/system/ntp"
  4. 如果沒有,請遵循現有的「如何」程序 (VxRail 程序→其他→「如何」程序→變更 VxRail IP 位址→重新指向新的 NTP 伺服器 IP 位址),以更新 NTP 伺服器 IP 位址。

  5. 如有需要,請使用其他運作中的 NTP 伺服器,重新啟動節點 vpxa 和主機服務,請注意,重新啟動節點 vpxa 和主機服務可暫時修正問題,但如果 NTP 無法再次同步,問題就會再出現。

    /etc/init.d/vpxa restart
    /etc/init.d/hostd restart

 

Affected Products

VxRail
Article Properties
Article Number: 000224344
Article Type: Solution
Last Modified: 24 May 2024
Version:  2
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.