ECS:xDoctor:RAP081: 症状代码:2048:系统时间差高于 ERROR 阈值

Summary: xDoctor 检测到网络时间协议 (NTP) 守护程序问题。

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

ECS 机架中的所有节点都应运行 NTP 守护程序,并且配置的 NTP 服务器应能够同步时间。否则,这可能会导致前端数据接收出现问题。

症状

消息

系统时间差异高于 ERROR 阈值

消息 = 系统时间差高于 ERROR 额外阈值
= [节点列表]

Cause

如果上述症状在 24 小时内未出现,则仍为“警告”。
24 小时后,如果此问题持续存在,严重性将增加到 ERROR,并报告 RAP081。

Resolution

由于节点上的 ntpd 服务每小时更新一次 NTP 偏移文件导致的节点时间差。当节点上以前发生网络问题时,可能会发生此问题。重新加入网络后,会创建不正确的偏移文件,从而强制在节点之间设置时间差。

当节点在出现问题后重新加入网络时,它可能会临时创建偏移文件,以匹配 NTP 服务器上的 NTP 时间。这应该是暂时的,但如果 ntpd 无法删除文件,我们可能会删除偏移文件并重新启动服务以恢复它。

验证:
检查是否所有 NTP 服务器都可以 ping。
1.确认是否已启用合规性。

命令:

# domulti 'cat /opt/emc/caspian/fabric/agent/conf/agent_customize.conf | grep compliance_enabled'
示例:
admin@node1:~> domulti 'cat /opt/emc/caspian/fabric/agent/conf/agent_customize.conf | grep compliance_enabled'

192.168.219.1
========================================
compliance_enabled = true

192.168.219.2
========================================
compliance_enabled = true

192.168.219.3
========================================
compliance_enabled = true

192.168.219.4
========================================
compliance_enabled = true

2.检查弹性云服务器以确定群集是否合规。 

命令: 
# viprexec -i "/opt/emc/caspian/fabric/cli/bin/fcli lifecycle cluster.compliance"
示例:
admin@node1:~> viprexec -i "/opt/emc/caspian/fabric/cli/bin/fcli lifecycle cluster.compliance"

Output from host : 192.168.219.1
{
  "compliance": "NON_COMPLIANT",
  "status": "OK",
  "etag": 22527
}

Output from host : 192.168.219.2
{
  "compliance": "NON_COMPLIANT",
  "status": "OK",
  "etag": 22527
}

Output from host : 192.168.219.3
{
  "compliance": "NON_COMPLIANT",
  "status": "OK",
  "etag": 22527
}

Output from host : 192.168.219.4
{
  "compliance": "NON_COMPLIANT",
  "status": "OK",
  "etag": 22527
}

预期输出为 COMPLIANT。如果我们看到NON_COMPLIANT,那么我们必须调查原因。

3.在每个节点上运行合规性检查脚本以识别任何不合规节点,这可能会导致 ECS 检查显示不合规。

在所有节点上运行合规性脚本,“NTP 对等体不同步”的节点在某些节点上可能存在 NTP 偏移文件问题。“正在检查合规性...”的输出在无故障的节点上,输出表示检查通过且未发现问题。

命令:
# domulti /opt/emc/caspian/fabric/agent/conf/compliance_check.sh
示例:
admin@node1:~> domulti /opt/emc/caspian/fabric/agent/conf/compliance_check.sh
 
192.168.219.1
========================================
Checking compliance...
    NTP peers out of sync
 
192.168.219.2
========================================
Checking compliance...
   
 
192.168.219.3
========================================
Checking compliance...
    NTP peers out of sync
 
192.168.219.4
========================================
Checking compliance...
    NTP peers out of sync

如果输出的是“NTP peers out of sync”,请转至下面的“peers out of sync”部分。

分辨率:
1.检查 NTP 偏移量是否超过 10 (+/-),这可能会导致合规性警报。

命令:
# viprexec -i "ntpq -nc peers"
示例:(提醒:每个节点有三个 NTP 服务器示例。)
admin@node1:~> viprexec -i "ntpq -nc peers"

Output from host : 169.254.1.1  
remote refid st t when poll reach delay offset jitter
==============================================================================
*10.xxx.xxx.16 .GPSs. 1 u 31 64 377 0.103 -367.66 44.909
+10.xxx.xxx.33 .GPSs. 1 u 32 64 377 0.097 -368.68 44.341
+10.xxx.xxx.35 .GPSs. 1 u 16 64 377 0.107 -338.96 69.736

Output from host : 169.254.1.2 
remote refid st t when poll reach delay offset jitter
==============================================================================
+10.xxx.xxx.16 .GPSs. 1 u 26 64 377 0.089 8.566 0.746
*10.xxx.xxx.33 .GPSs. 1 u 26 64 377 0.100 8.585 0.739
+10.xxx.xxx.35 .GPSs. 1 u 23 64 377 0.104 8.888 0.592

Output from host : 169.254.1.3 
remote refid st t when poll reach delay offset jitter
==============================================================================
*10.xxx.xxx.16 .GPSs. 1 u 31 64 377 0.101 -354.40 52.444
+10.xxx.xxx.33 .GPSs. 1 u 29 64 377 0.101 -338.84 63.750
+10.xxx.xxx.35 .GPSs. 1 u 39 64 377 0.106 -387.28 44.286


Output from host : 169.254.1.4 
remote refid st t when poll reach delay offset jitter
==============================================================================
*10.xxx.xxx.16 .GPSs. 1 u 26 64 377 0.084 72.675 9.200
+10.xxx.xxx.33 .GPSs. 1 u 37 64 377 0.107 65.047 14.913
+10.xxx.xxx.35 .GPSs. 1 u 33 64 377 0.103 87.374 13.435

Output from host : 169.254.1.5 
remote refid st t when poll reach delay offset jitter
==============================================================================
*10.xxx.xxx.16 .GPSs. 1 u 27 64 377 0.094 352.741 54.056
+10.xxx.xxx.33 .GPSs. 1 u 26 64 377 0.103 413.893 43.770
+10.xxx.xxx.35 .GPSs. 1 u 33 64 377 0.101 334.493 69.059

Output from host : 169.254.1.6 
remote refid st t when poll reach delay offset jitter
==============================================================================
+10.xxx.xxx.16 .GPSs. 1 u 27 64 377 0.101 -428.51 54.955
+10.xxx.xxx.33 .GPSs. 1 u 26 64 377 0.097 -326.21 91.208
*10.xxx.xxx.35 .GPSs. 1 u 32 64 377 0.098 -349.00 70.110

如果我们重新启动 ntpd 服务 viprexec -i “ntpq -nc peers”,我们会在片刻内偏移低于 10,然后又增加到 100 以上

2.节点的 ntp.drift 文件在 ntpd 服务重新启动后重新应用不正确的偏移可能会导致此问题:

命令:
# viprexec -i "cat /var/lib/ntp/drift/ntp.drift"
示例:
admin@node1:~> viprexec -i "cat /var/lib/ntp/drift/ntp.drift"

Output from host : 169.254.1.1 
500.000

Output from host : 169.254.1.2 
-14.212

Output from host : 169.254.1.3 
500.000

Output from host : 169.254.1.4 
-102.474

Output from host : 169.254.1.5 
-500.000

Output from host : 169.254.1.6 
500.000

由于临时网络问题,可能会自动生成此偏移大小的 NTP 偏移文件。当重新建立与 NTP 服务的连接的节点发现自己偏离了正确的时间时,应生成文件以重新纠正自身。片刻之后,就不需要偏移文件了,可以将其删除。因此,应进行以下操作。 

1.应停止 ntpd 服务。
2.ntp.drift 文件将被删除。
3.Ntpd 服务再次启动。

提醒:ntpd.service 是无影响服务。


命令:
# viprexec -i "systemctl stop ntpd"
# viprexec -i "cat /var/lib/ntp/drift/ntp.drift
# viprexec -i "rm -f /var/lib/ntp/drift/ntp.drift"
# viprexec -i "ntpd -gq"
# viprexec -i "systemctl start ntpd"
# viprexec -i "ntpq -p"

重新运行合规性检查脚本: viprexec -i “/opt/emc/caspian/fabric/agent/conf/compliance_check.sh”

如果 NTP 偏移文件为零,请检查 NTP 中是否存在任何日期偏移,然后重新启动 ntpd 服务。 

命令:
# viprexec "date +%s" 2>&1 | grep "^15"
示例:
admin@node1:~> viprexec "date +%s" 2>&1 | grep "^15"
1554470147
1554470111
1554470096
1554470142
1554470144
1554470109
1554470124
1554470140

节点之间的差异表示需要 NTP 漂移和 ntpd 服务重新启动。检查 ntpd 服务状态,然后重新启动该服务。(即使状态为“已启动且正在运行”,也要继续重新启动。)提醒:ntpd.service 是无影响服务。

命令:
# viprexec systemctl status ntpd.service | grep Active:
示例:
admin@node1:~> viprexec systemctl status ntpd.service | grep Active:
   Active: active (running) since Tue 2019-08-06 02:49:06 UTC; 1 day 18h ago
   Active: active (running) since Tue 2019-08-06 02:49:07 UTC; 1 day 18h ago
   Active: active (running) since Wed 2019-08-07 20:13:27 UTC; 58min ago
   Active: active (running) since Tue 2019-08-06 02:49:06 UTC; 1 day 18h ago
   Active: active (running) since Tue 2019-08-06 02:49:07 UTC; 1 day 18h ago
   Active: active (running) since Tue 2019-08-06 02:49:07 UTC; 1 day 18h ago
   Active: active (running) since Tue 2019-08-06 02:49:07 UTC; 1 day 18h ago
   Active: active (running) since Tue 2019-08-06 02:49:07 UTC; 1 day 18h ago
命令:
# viprexec -i "systemctl restart ntpd.service"
示例:
admin@node1:~> viprexec systemctl restart ntpd.service
Output from host : 192.168.219.1
Output from host : 192.168.219.2
Output from host : 192.168.219.3
Output from host : 192.168.219.4
Output from host : 192.168.219.5
Output from host : 192.168.219.6
Output from host : 192.168.219.7
Output from host : 192.168.219.8

NTP 偏移应得到解决:

命令:
# viprexec -i "date +%s" 2>&1 | grep "^15"
示例:
admin@node1:~> viprexec -i "date +%s" 2>&1 | grep "^15"
1585746672
1585746672
1585746672
1585746672
1585746672
1585746672
1585746672
1585746672

如果问题仍然存在或与上述问题不匹配,请联系 ECS 技术支持。

Additional Information

如果上述解决方案不起作用,则必须联系客户的网络团队来解决 NTP 问题。

有关症状“NTP 守护程序未运行”(NTPD_NOT_RUNNING),请参阅知识库文章:
ECS: xDoctor:RAP081: 症状代码:2048:NTP 守护程序未运行

有关症状“所有 NTP 服务器都不适合同步”(NTP_NOT_SUITABLE_ERROR),请参阅知识库文章:
ECS: xDoctor:RAP081: 症状代码:2048:所有 NTP 服务器都不适合同步。

有关症状“所有 NTP 服务器都调整偏移量高于错误阈值”(NTP_ERROR_OFFSET_ERROR),请参阅知识库文章:
ECS: xDoctor:RAP081: 症状代码:2048:所有 NTP 服务器都将偏移量调整到高于错误阈值的位置。

Affected Products

ECS

Products

ECS Appliance, ECS Appliance Gen 1, ECS Appliance Gen 2, ECS Appliance Gen 3, ECS Software
Article Properties
Article Number: 000230636
Article Type: Solution
Last Modified: 03 Oct 2024
Version:  2
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.