PowerScale 節點在 OneFS 9.1 升級後進入唯讀
Summary: 將叢集升級到 OneFS 9.1.0.20 後,叢集中的所有 PowerScale (F200、F600、F900) 節點都會進入唯讀模式。
This article applies to
This article does not apply to
This article is not tied to any specific product.
Not all product versions are identified in this article.
Symptoms
將叢集升級到 OneFS 9.1.0.20 後,「isi status」會在唯讀 (RO) 模式中顯示所有 PowerScale 節點:
受影響節點的 /var/log/messages 檔案中會看到類似以下專案:
Node Pool Name: f600_60tb-ssd_384gb Protection: +2d:1n Pool Storage: HDD SSD Storage Size: 0 (0 Raw) 0 (0 Raw) VHS Size: 0.0 Used: 0 (n/a) 0 (n/a) Avail: 0 (n/a) 0 (n/a) Throughput (bps) HDD Storage SSD Storage Name Health| In Out Total| Used / Size |Used / Size -------------------+-----+-----+-----+-----+-----------------+----------------- 123|n/a |-A-R |938.7| 9.9M| 9.9M|(No Storage HDDs)|(No Storage SSDs) 124|n/a |-A-R | 0| 9.9M| 9.9M|(No Storage HDDs)|(No Storage SSDs) 125|n/a |-A-R | 0|10.8M|10.8M|(No Storage HDDs)|(No Storage SSDs) 126|n/a |-A-R | 0| 9.9M| 9.9M|(No Storage HDDs)|(No Storage SSDs) 127|n/a |-A-R | 1.4k| 9.9M| 9.9M|(No Storage HDDs)|(No Storage SSDs) 128|n/a |-A-R | 0| 7.9M| 7.9M|(No Storage HDDs)|(No Storage SSDs) 129|n/a |-A-R | 0| 7.9M| 7.9M|(No Storage HDDs)|(No Storage SSDs) 130|n/a |-A-R | 0| 7.3M| 7.3M|(No Storage HDDs)|(No Storage SSDs) -------------------+-----+-----+-----+-----+-----------------+----------------- f600_60tb-ssd_384gb| OK |293.3| 9.2M| 9.2M|(No Storage HDDs)|(No Storage SSDs)
受影響節點的 /var/log/messages 檔案中會看到類似以下專案:
2022-07-26T01:40:46+02:00 (id92) isi_testjournal: NVDIMM is persistent 2022-07-26T01:40:46+02:00 (id92) isi_testjournal: NVDIMM armed for persistent writes 2022-07-26T01:40:47+02:00 (id92) ifconfig: Configure: /sbin/ifconfig ue0 netmask 255.255.255.0 169.254.0.40 2022-07-26T01:40:47+02:00 (id92) dsm_ism_srvmgrd[2056]: ISM0000 [iSM@674.10892.2 EventID="8716" EventCategory="Audit" EventSeverity="info" IsPastEvent="false" language="en-US"] The iDRAC Service Module is started on the operating system (OS) of server. 2022-07-26T01:40:47+02:00 (id92) dsm_ism_srvmgrd[2056]: ISM0003 [iSM@674.10892.2 EventID="8196" EventCategory="Audit" EventSeverity="error" IsPastEvent="false" language="en-US"] The iDRAC Service Module is unable to discover iDRAC from the operating system of the server. 2022-07-26T01:44:15+02:00 (id92) isi_testjournal: PowerTools Agent Query Exception: Timeout (20 sec) exceeded for request http://127.0.0.1:8086/api/PT/v1/host/sensordata?sensorSelector=iDRAC.Embedded.1%23SystemBoardNVDIMMBattery&sensorType=DellSensor data: HTTPConnectionPool(host='127.0.0.1', port=8086): Read timed out. (read timeout=20) 2022-07-26T01:44:20+02:00 (id92) isi_testjournal: Query to PowerTools Agent for NVDIMM Battery failed
Cause
此問題似乎與 OneFS 版本 9.1.0.19 中對 NVDIMM 狀態監視代碼所做的變更有關,這可能會導致在啟動時初始 NVDIMM 狀態查詢期間發生逾時,使節點進入唯讀模式。即使後續狀態查詢成功,節點也不會自動回到讀寫模式。OneFS 9.2.x 和更新版本不受此問題影響。
Resolution
若要確認 NVDIMM 狀況良好,且您是否正在執行此 KB 中所述的問題,請執行下列四個命令:
# isi_hwmon -b NVDIMMHealthMonitoring # isi_hwmon -b NVDIMMPersistence # /opt/dell/DellPTAgent/tools/pta_call get agent/info # /opt/dell/DellPTAgent/tools/pta_call post "host/sensordata?sensorSelector=iDRAC.Embedded.1%23SystemBoardNVDIMMBattery&sensorType=DellSensor"
這些命令僅為查詢命令,應視為不中斷。
此狀態下節點的命令輸出應類似:
# isi_hwmon -b NVDIMMHealthMonitoring
DIMM SLOT A7: OK
# isi_hwmon -b NVDIMMPersistence
NVDIMM Index 0
State: PERSISTENT
Vendor Serial ID: xxxxxxxxx
Correctable ECC Count: 0
Uncorrectable ECC Count: 0
Current Temp: 255
Health: 0
NVM Lifetime: 90
Warning Threshold Status: 0
Error Threshold Status: 0
Health Info Status: 0
Critical Health Info: 0
Critical Info Status: 0
Last Save Status: 0
Last Restore Status: 0
Last Flush Status: 0
Armed: 1
SMART/Health Events Observed: 0
FW Health Monitoring: 1
NVDIMM Mapped: 1
# /opt/dell/DellPTAgent/tools/pta_call post "host/sensordata?sensorSelector=iDRAC.Embedded.1%23SystemBoardNVDIMMBattery&sensorType=DellSensor"
Request sent to DellPTAgent @ http://127.0.0.1:8086 [127.0.0.1]
{
"HealthState": "OK",
"EnabledState": "Enabled",
"ElementName": "System Board NVDIMM Battery",
"SensorType": "Other",
"Id": "iDRAC.Embedded.1_0x23_SystemBoardNVDIMMBattery",
"CurrentState": "Good"
}
Response: status: 200 [OK], size: 223 bytes, latency: 0.034 seconds.
# /opt/dell/DellPTAgent/tools/pta_call get agent/info
Request sent to DellPTAgent @ http://127.0.0.1:8086 [127.0.0.1]
{
"idrac_ethernet_ip": "0.0.0.0",
"servicetag": "xxxxx",
"uptime": "2511 seconds ( 41 minutes 51 seconds )",
"status": {
"agent": "OK",
"idracConnection": "OK",
"idraccache": "OK",
"iSM": "N/A"
},
"name": "ClusterName-123",
"MarvellLibraryVersion": "Not loaded",
"system_uuid": "xxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"default_server_cert": "true",
"rest_endpoints": "http://127.0.0.1:8086"" [127.0.0.1],
"ptagentversion": "2.5.6-4",
"domain": "",
"host_epoch_time": "xxxxxxxxxx.354221 (secs.usecs)",
"os_version": "9.1.0.0",
"mfr": "Dell Inc.",
"process_id": "2071",
"api_blocking_enabled": "false",
"host_pass_thru_ip": "xxx.xxx.xxx.xxx",
"model": "PowerScale F600",
"idrac_pass_thru_ip": "xxx.xxx.xxx.xxx",
"os": "Isilon OneFS",
"ism_version": "dell-dcism-3.4.6.13_7"
}
Response: status: 200 [OK], size: 871 bytes, latency: 0.009 seconds.
此處提供的輸出為範例,您得到的輸出可能有所差異。重要零件是輸出看起來類似,而且您沒有收到錯誤訊息而非輸出。
- 如果有任何通訊問題/錯誤的指示,您必須繼續疑難排解問題,並視需要聯絡 HW L2/SME 及/或 PowerEdge 支援小組。
- 如果輸出表示 NVDIMM 處於良好狀態且沒有問題,您可以使用下列命令手動清除 RO 狀態:
# /usr/bin/isi_hwtools/isi_read_only --unset=system-nvdimm-failed
套用修正步驟後,請監視節點約 10 分鐘,確保不會回到 RO 模式。如果節點已重新開機或重新開機,則可能會再次發生此問題,而且可能需要重新套用此因應措施。PowerScale 工程部門已察覺此問題,並正在調查即將于 OneFS 9.1 版中實施的緩解步驟。在此同時,若要永久解決此問題,您可以將叢集升級至 OneFS 9.2 或更新版本。
Affected Products
PowerScale F200, PowerScale F600, PowerScale F900Article Properties
Article Number: 000201933
Article Type: Solution
Last Modified: 06 Jul 2023
Version: 3
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.