ECS:xDoctor:RAP081:症狀代碼:2048: 系統時差高於錯誤閾值

Summary: xDoctor 偵測到網路時間通訊協定 (NTP) 精靈問題。

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

ECS 機架中的所有節點都應執行 NTP 精靈,且設定的 NTP 伺服器應能同步時間。否則,這可能會導致前端數據引入問題。

症狀

訊息

系統時差高於錯誤閾值

訊息 = 系統時差高於錯誤閾值額外
= [節點清單]

Cause

如果未在 24 小時內發生上述症狀,則仍構成警告。
24 小時後,如果這種情況持續存在,嚴重程度將會增加為錯誤,並報告 RAP081。

Resolution

由於節點上的 ntpd 服務每小時更新 NTP 漂移檔案導致的節點時差。當節點先前發生網路問題時,可能會發生此問題。重新加入網路後,會建立不正確的漂移檔案,在節點之間強制執行時差。

當節點在問題發生後重新加入網路時,可能會暫時建立漂移檔案,以符合 NTP 伺服器上的 NTP 時間。這應該是暫時的,但如果 ntpd 無法移除檔案,我們可能會刪除漂移檔案並重新啟動服務以將其還原。

驗證:
檢查是否所有 NTP 伺服器都可以 ping。
1.確認「相容性」是否已啟用。

命令:

# domulti 'cat /opt/emc/caspian/fabric/agent/conf/agent_customize.conf | grep compliance_enabled'
範例:
admin@node1:~> domulti 'cat /opt/emc/caspian/fabric/agent/conf/agent_customize.conf | grep compliance_enabled'

192.168.219.1
========================================
compliance_enabled = true

192.168.219.2
========================================
compliance_enabled = true

192.168.219.3
========================================
compliance_enabled = true

192.168.219.4
========================================
compliance_enabled = true

2.檢查 ECS 以確定集群是否相容。 

命令: 
# viprexec -i "/opt/emc/caspian/fabric/cli/bin/fcli lifecycle cluster.compliance"
範例:
admin@node1:~> viprexec -i "/opt/emc/caspian/fabric/cli/bin/fcli lifecycle cluster.compliance"

Output from host : 192.168.219.1
{
  "compliance": "NON_COMPLIANT",
  "status": "OK",
  "etag": 22527
}

Output from host : 192.168.219.2
{
  "compliance": "NON_COMPLIANT",
  "status": "OK",
  "etag": 22527
}

Output from host : 192.168.219.3
{
  "compliance": "NON_COMPLIANT",
  "status": "OK",
  "etag": 22527
}

Output from host : 192.168.219.4
{
  "compliance": "NON_COMPLIANT",
  "status": "OK",
  "etag": 22527
}

預期的輸出符合規範。如果我們看到NON_COMPLIANT,那麼我們必須調查原因。

3.在每個節點上執行相容性檢查指令檔,以識別任何不相容的節點,這可能會導致 ECS 檢查顯示不相容性。

在所有節點上執行相容性指令檔,具有「NTP 對等不同步」的節點可能會在某些節點上發生 NTP 漂移檔案問題。「正在檢查相容性...」的輸出結果在節點上沒有故障輸出表示檢查已通過,且未發現任何問題。

命令:
# domulti /opt/emc/caspian/fabric/agent/conf/compliance_check.sh
範例:
admin@node1:~> domulti /opt/emc/caspian/fabric/agent/conf/compliance_check.sh
 
192.168.219.1
========================================
Checking compliance...
    NTP peers out of sync
 
192.168.219.2
========================================
Checking compliance...
   
 
192.168.219.3
========================================
Checking compliance...
    NTP peers out of sync
 
192.168.219.4
========================================
Checking compliance...
    NTP peers out of sync

如果輸出「NTP 對等不同步」,請轉到下面的「對等不同步」部分。

主意:
1.檢查 NTP 偏移是否超過 10 (+/-),這可能會導致相容性警示。

命令:
# viprexec -i "ntpq -nc peers"
範例:(注意:例如,每個節點有三個 NTP 伺服器。)
admin@node1:~> viprexec -i "ntpq -nc peers"

Output from host : 169.254.1.1  
remote refid st t when poll reach delay offset jitter
==============================================================================
*10.xxx.xxx.16 .GPSs. 1 u 31 64 377 0.103 -367.66 44.909
+10.xxx.xxx.33 .GPSs. 1 u 32 64 377 0.097 -368.68 44.341
+10.xxx.xxx.35 .GPSs. 1 u 16 64 377 0.107 -338.96 69.736

Output from host : 169.254.1.2 
remote refid st t when poll reach delay offset jitter
==============================================================================
+10.xxx.xxx.16 .GPSs. 1 u 26 64 377 0.089 8.566 0.746
*10.xxx.xxx.33 .GPSs. 1 u 26 64 377 0.100 8.585 0.739
+10.xxx.xxx.35 .GPSs. 1 u 23 64 377 0.104 8.888 0.592

Output from host : 169.254.1.3 
remote refid st t when poll reach delay offset jitter
==============================================================================
*10.xxx.xxx.16 .GPSs. 1 u 31 64 377 0.101 -354.40 52.444
+10.xxx.xxx.33 .GPSs. 1 u 29 64 377 0.101 -338.84 63.750
+10.xxx.xxx.35 .GPSs. 1 u 39 64 377 0.106 -387.28 44.286


Output from host : 169.254.1.4 
remote refid st t when poll reach delay offset jitter
==============================================================================
*10.xxx.xxx.16 .GPSs. 1 u 26 64 377 0.084 72.675 9.200
+10.xxx.xxx.33 .GPSs. 1 u 37 64 377 0.107 65.047 14.913
+10.xxx.xxx.35 .GPSs. 1 u 33 64 377 0.103 87.374 13.435

Output from host : 169.254.1.5 
remote refid st t when poll reach delay offset jitter
==============================================================================
*10.xxx.xxx.16 .GPSs. 1 u 27 64 377 0.094 352.741 54.056
+10.xxx.xxx.33 .GPSs. 1 u 26 64 377 0.103 413.893 43.770
+10.xxx.xxx.35 .GPSs. 1 u 33 64 377 0.101 334.493 69.059

Output from host : 169.254.1.6 
remote refid st t when poll reach delay offset jitter
==============================================================================
+10.xxx.xxx.16 .GPSs. 1 u 27 64 377 0.101 -428.51 54.955
+10.xxx.xxx.33 .GPSs. 1 u 26 64 377 0.097 -326.21 91.208
*10.xxx.xxx.35 .GPSs. 1 u 32 64 377 0.098 -349.00 70.110

如果我們重新啟動 ntpd 服務,viprexec -i “ntpq -nc peers”,我們會在片刻內將偏移量低於 10,然後又增加到超過 100

2.在 ntpd 服務重新啟動後,節點的 ntp.drift 檔案重新套用不正確的偏移可能會導致此問題:

命令:
# viprexec -i "cat /var/lib/ntp/drift/ntp.drift"
範例:
admin@node1:~> viprexec -i "cat /var/lib/ntp/drift/ntp.drift"

Output from host : 169.254.1.1 
500.000

Output from host : 169.254.1.2 
-14.212

Output from host : 169.254.1.3 
500.000

Output from host : 169.254.1.4 
-102.474

Output from host : 169.254.1.5 
-500.000

Output from host : 169.254.1.6 
500.000

由於暫時的網路問題,可能會自動產生此偏移大小的 NTP 漂移檔案。當節點重新建立與 NTP 服務的連線時,發現自己在正確的時間,並產生檔以重新校正自身。片刻之後,就不需要漂移檔案,可能會將其移除。因此,應執行以下操作。 

1.ntpd 服務應停止。
2.隨即移除 ntp.drift 檔案。
3.ntpd 服務再次啟動。

注意:ntpd.service 是一項不影響的服務。


命令:
# viprexec -i "systemctl stop ntpd"
# viprexec -i "cat /var/lib/ntp/drift/ntp.drift
# viprexec -i "rm -f /var/lib/ntp/drift/ntp.drift"
# viprexec -i "ntpd -gq"
# viprexec -i "systemctl start ntpd"
# viprexec -i "ntpq -p"

重新執行相容性檢查指令檔:viprexec -i「/opt/emc/caspian/fabric/agent/conf/compliance_check.sh」

如果 NTP 漂移檔案為零,請檢查 NTP 中是否有任何日期漂移,然後重新啟動 ntpd 服務。 

命令:
# viprexec "date +%s" 2>&1 | grep "^15"
範例:
admin@node1:~> viprexec "date +%s" 2>&1 | grep "^15"
1554470147
1554470111
1554470096
1554470142
1554470144
1554470109
1554470124
1554470140

節點之間的差異表示需要使用 ntpd 服務重新開機進行 NTP 漂移。檢查 ntpd 服務狀態,然後重新啟動服務。(即使狀態為啟動並執行,請繼續重新啟動。)注意:ntpd.service 是一項不影響的服務。

命令:
# viprexec systemctl status ntpd.service | grep Active:
範例:
admin@node1:~> viprexec systemctl status ntpd.service | grep Active:
   Active: active (running) since Tue 2019-08-06 02:49:06 UTC; 1 day 18h ago
   Active: active (running) since Tue 2019-08-06 02:49:07 UTC; 1 day 18h ago
   Active: active (running) since Wed 2019-08-07 20:13:27 UTC; 58min ago
   Active: active (running) since Tue 2019-08-06 02:49:06 UTC; 1 day 18h ago
   Active: active (running) since Tue 2019-08-06 02:49:07 UTC; 1 day 18h ago
   Active: active (running) since Tue 2019-08-06 02:49:07 UTC; 1 day 18h ago
   Active: active (running) since Tue 2019-08-06 02:49:07 UTC; 1 day 18h ago
   Active: active (running) since Tue 2019-08-06 02:49:07 UTC; 1 day 18h ago
命令:
# viprexec -i "systemctl restart ntpd.service"
範例:
admin@node1:~> viprexec systemctl restart ntpd.service
Output from host : 192.168.219.1
Output from host : 192.168.219.2
Output from host : 192.168.219.3
Output from host : 192.168.219.4
Output from host : 192.168.219.5
Output from host : 192.168.219.6
Output from host : 192.168.219.7
Output from host : 192.168.219.8

應解決以下問題的 NTP 漂移:

命令:
# viprexec -i "date +%s" 2>&1 | grep "^15"
範例:
admin@node1:~> viprexec -i "date +%s" 2>&1 | grep "^15"
1585746672
1585746672
1585746672
1585746672
1585746672
1585746672
1585746672
1585746672

如果問題仍然存在或與上述問題不符,請聯絡 ECS 技術支援部門。

Additional Information

如果上述解決方案無效,客戶的網路團隊必須介入以解決 NTP 問題。

針對「NTP 精靈未執行」(NTPD_NOT_RUNNING 的症狀,請參閱知識文章:
ECS:xDoctor:RAP081:症狀代碼:2048: NTP 精靈未執行

針對「所有 NTP 伺服器都不適合同步」(NTP_NOT_SUITABLE_ERROR) 的症狀,請參閱知識文章:
ECS: xDoctor:RAP081:症狀代碼:2048: 所有 NTP 伺服器都不適合同步處理。

針對「所有 NTP 伺服器調整的偏移量均高於錯誤閾值」(NTP_ERROR_OFFSET_ERROR) 的症狀,請參閱知識文章:
ECS:xDoctor:RAP081:症狀代碼:2048: 所有 NTP 伺服器調整的偏移量都高於錯誤閾值。

Affected Products

ECS

Products

ECS Appliance, ECS Appliance Gen 1, ECS Appliance Gen 2, ECS Appliance Gen 3, ECS Software
Article Properties
Article Number: 000230636
Article Type: Solution
Last Modified: 03 Oct 2024
Version:  2
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.