PowerScale 节点在 OneFS 9.1 升级后变为只读状态
Summary: 将群集升级到 OneFS 9.1.0.20 后,群集中的所有 PowerScale(F200、F600、F900)节点都进入只读模式。
This article applies to
This article does not apply to
This article is not tied to any specific product.
Not all product versions are identified in this article.
Symptoms
将群集升级到 OneFS 9.1.0.20 后,“isi status”显示处于只读 (RO) 模式的所有 PowerScale 节点:
对于受影响的节点,您将在 /var/log/messages 文件中看到类似于以下内容的条目:
Node Pool Name: f600_60tb-ssd_384gb Protection: +2d:1n Pool Storage: HDD SSD Storage Size: 0 (0 Raw) 0 (0 Raw) VHS Size: 0.0 Used: 0 (n/a) 0 (n/a) Avail: 0 (n/a) 0 (n/a) Throughput (bps) HDD Storage SSD Storage Name Health| In Out Total| Used / Size |Used / Size -------------------+-----+-----+-----+-----+-----------------+----------------- 123|n/a |-A-R |938.7| 9.9M| 9.9M|(No Storage HDDs)|(No Storage SSDs) 124|n/a |-A-R | 0| 9.9M| 9.9M|(No Storage HDDs)|(No Storage SSDs) 125|n/a |-A-R | 0|10.8M|10.8M|(No Storage HDDs)|(No Storage SSDs) 126|n/a |-A-R | 0| 9.9M| 9.9M|(No Storage HDDs)|(No Storage SSDs) 127|n/a |-A-R | 1.4k| 9.9M| 9.9M|(No Storage HDDs)|(No Storage SSDs) 128|n/a |-A-R | 0| 7.9M| 7.9M|(No Storage HDDs)|(No Storage SSDs) 129|n/a |-A-R | 0| 7.9M| 7.9M|(No Storage HDDs)|(No Storage SSDs) 130|n/a |-A-R | 0| 7.3M| 7.3M|(No Storage HDDs)|(No Storage SSDs) -------------------+-----+-----+-----+-----+-----------------+----------------- f600_60tb-ssd_384gb| OK |293.3| 9.2M| 9.2M|(No Storage HDDs)|(No Storage SSDs)
对于受影响的节点,您将在 /var/log/messages 文件中看到类似于以下内容的条目:
2022-07-26T01:40:46+02:00 (id92) isi_testjournal: NVDIMM is persistent 2022-07-26T01:40:46+02:00 (id92) isi_testjournal: NVDIMM armed for persistent writes 2022-07-26T01:40:47+02:00 (id92) ifconfig: Configure: /sbin/ifconfig ue0 netmask 255.255.255.0 169.254.0.40 2022-07-26T01:40:47+02:00 (id92) dsm_ism_srvmgrd[2056]: ISM0000 [iSM@674.10892.2 EventID="8716" EventCategory="Audit" EventSeverity="info" IsPastEvent="false" language="en-US"] The iDRAC Service Module is started on the operating system (OS) of server. 2022-07-26T01:40:47+02:00 (id92) dsm_ism_srvmgrd[2056]: ISM0003 [iSM@674.10892.2 EventID="8196" EventCategory="Audit" EventSeverity="error" IsPastEvent="false" language="en-US"] The iDRAC Service Module is unable to discover iDRAC from the operating system of the server. 2022-07-26T01:44:15+02:00 (id92) isi_testjournal: PowerTools Agent Query Exception: Timeout (20 sec) exceeded for request http://127.0.0.1:8086/api/PT/v1/host/sensordata?sensorSelector=iDRAC.Embedded.1%23SystemBoardNVDIMMBattery&sensorType=DellSensor data: HTTPConnectionPool(host='127.0.0.1', port=8086): Read timed out. (read timeout=20) 2022-07-26T01:44:20+02:00 (id92) isi_testjournal: Query to PowerTools Agent for NVDIMM Battery failed
Cause
此问题似乎与 OneFS 版本 9.1.0.19 中对 NVDIMM 状态监视代码的更改相关,这可能导致在启动时初始 NVDIMM 状态查询期间发生超时,从而将节点置于只读模式。即使后续状态查询成功,该节点也不会自动返回到读写模式。OneFS 9.2.x 和更高版本不受此问题的影响。
Resolution
要验证 NVDIMM 是否运行状况良好并且您遇到了本知识库文章中所述的问题,请运行以下四个命令:
# isi_hwmon -b NVDIMMHealthMonitoring # isi_hwmon -b NVDIMMPersistence # /opt/dell/DellPTAgent/tools/pta_call get agent/info # /opt/dell/DellPTAgent/tools/pta_call post "host/sensordata?sensorSelector=iDRAC.Embedded.1%23SystemBoardNVDIMMBattery&sensorType=DellSensor"
这些命令是仅查询命令,应视为无中断。
处于此状态的节点的命令的输出应类似于:
# isi_hwmon -b NVDIMMHealthMonitoring
DIMM SLOT A7: OK
# isi_hwmon -b NVDIMMPersistence
NVDIMM Index 0
State: PERSISTENT
Vendor Serial ID: xxxxxxxxx
Correctable ECC Count: 0
Uncorrectable ECC Count: 0
Current Temp: 255
Health: 0
NVM Lifetime: 90
Warning Threshold Status: 0
Error Threshold Status: 0
Health Info Status: 0
Critical Health Info: 0
Critical Info Status: 0
Last Save Status: 0
Last Restore Status: 0
Last Flush Status: 0
Armed: 1
SMART/Health Events Observed: 0
FW Health Monitoring: 1
NVDIMM Mapped: 1
# /opt/dell/DellPTAgent/tools/pta_call post "host/sensordata?sensorSelector=iDRAC.Embedded.1%23SystemBoardNVDIMMBattery&sensorType=DellSensor"
Request sent to DellPTAgent @ http://127.0.0.1:8086 [127.0.0.1]
{
"HealthState": "OK",
"EnabledState": "Enabled",
"ElementName": "System Board NVDIMM Battery",
"SensorType": "Other",
"Id": "iDRAC.Embedded.1_0x23_SystemBoardNVDIMMBattery",
"CurrentState": "Good"
}
Response: status: 200 [OK], size: 223 bytes, latency: 0.034 seconds.
# /opt/dell/DellPTAgent/tools/pta_call get agent/info
Request sent to DellPTAgent @ http://127.0.0.1:8086 [127.0.0.1]
{
"idrac_ethernet_ip": "0.0.0.0",
"servicetag": "xxxxx",
"uptime": "2511 seconds ( 41 minutes 51 seconds )",
"status": {
"agent": "OK",
"idracConnection": "OK",
"idraccache": "OK",
"iSM": "N/A"
},
"name": "ClusterName-123",
"MarvellLibraryVersion": "Not loaded",
"system_uuid": "xxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"default_server_cert": "true",
"rest_endpoints": "http://127.0.0.1:8086"" [127.0.0.1],
"ptagentversion": "2.5.6-4",
"domain": "",
"host_epoch_time": "xxxxxxxxxx.354221 (secs.usecs)",
"os_version": "9.1.0.0",
"mfr": "Dell Inc.",
"process_id": "2071",
"api_blocking_enabled": "false",
"host_pass_thru_ip": "xxx.xxx.xxx.xxx",
"model": "PowerScale F600",
"idrac_pass_thru_ip": "xxx.xxx.xxx.xxx",
"os": "Isilon OneFS",
"ism_version": "dell-dcism-3.4.6.13_7"
}
Response: status: 200 [OK], size: 871 bytes, latency: 0.009 seconds.
此处提供的输出是一个示例,您获得的输出中可能存在差异。重要部分是输出看起来类似,并且您不会收到错误消息而不是输出。
- 如果有任何通信问题/错误迹象,您必须继续对问题进行故障排除,并根据需要联系硬件 L2/SME 和/或 PowerEdge 支持团队。
- 如果输出指示 NVDIMM 处于良好状态且没有问题,您可以使用以下命令手动清除 RO 状态:
# /usr/bin/isi_hwtools/isi_read_only --unset=system-nvdimm-failed
应用纠正步骤后,请监视节点约 10 分钟,以确保它不会恢复到 RO 模式。如果节点重启或重新启动,则可能会再次出现此问题,并且可能需要重新应用此解决方法。PowerScale 工程部门已意识到此问题,并正在研究即将发布的 OneFS 9.1 版本中要实施的缓解步骤。同时,要永久解决此问题,您可以将群集升级到 OneFS 9.2 或更高版本。
Affected Products
PowerScale F200, PowerScale F600, PowerScale F900Article Properties
Article Number: 000201933
Article Type: Solution
Last Modified: 06 Jul 2023
Version: 3
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.