PowerFlex:未创建 CloudIQ 的性能捆绑包
摘要: 网关 (GW) 未为 CloudIQ 生成性能捆绑包,但配置、容量和警报按预期生成。
症状
当系统配置为发送 CloudIQ 特征和统计数据时,GW 会生成系统配置、系统容量、系统警报和系统性能统计数据。
默认情况下,无法手动停止创建四个捆绑包中的任何一个。
由于 GW 与 MDM 之间的 RestAPI 响应解释存在问题,因此无法收集卷统计信息并生成性能捆绑包。
有关数据收集和生成流程的更多信息,请参阅“其他信息”部分。
在 GW scaleio-trace.log文件中可以找到以下异常错误:
2022-08-05 12:05:53,975 [AsyncHandler-21] ERROR c.e.scaleio.esrsmanager.EsrsManager - Alert hasn't been sent since ESRS reached limit of 200 per: 8 hours
com.emc.scaleio.esrsmanager.NotificationMessageLimitException: null <<<
at com.emc.scaleio.esrsmanager.ESRSConnector.sendConnectEmcMessage(ESRSConnector.java:360) ~[ams-1.0-SNAPSHOT.jar:na]
at com.emc.scaleio.esrsmanager.ESRSConnector.sendConnectEmcMessage(ESRSConnector.java:308) ~[ams-1.0-SNAPSHOT.jar:na]
at com.emc.scaleio.esrsmanager.EsrsManager.sendAlert(EsrsManager.java:566) [ams-1.0-SNAPSHOT.jar:na]
at com.emc.scaleio.esrsmanager.EsrsManager.sendAlert(EsrsManager.java:598) [ams-1.0-SNAPSHOT.jar:na]
at com.emc.scaleio.esrsmanager.BaseNotificationManager.busReceivedAlerts(BaseNotificationManager.java:103) [ams-1.0-SNAPSHOT.jar:na]
...
2022-08-05 12:05:53,975 [https-jsse-nio-443-exec-293] ERROR c.e.s.s.r.DeviceRepositoryImpl - Error in QueryPropertiesResponse for Device::-101652817808130048 property:PENDING_MOVING_OUT_FWD_REBUILD_JOBS has value type: UNDEFINED_PROP_TYPE
2022-08-05 12:05:53,975 [https-jsse-nio-443-exec-293] ERROR c.e.s.s.r.DeviceRepositoryImpl - Error in QueryPropertiesResponse for Device::-101652817808130048 property:NET_THIN_USER_DATA_CAPACITY_IN_KB has value type: UNDEFINED_PROP_TYPE
...
2022-08-05 12:05:53,982 [https-jsse-nio-443-exec-306] ERROR c.e.s.s.r.DeviceRepositoryImpl - Error in QueryPropertiesResponse for Device::-99119590261981183 property:PENDING_MOVING_OUT_FWD_REBUILD_JOBS has value type: UNDEFINED_PROP_TYPE
2022-08-05 12:05:53,982 [https-jsse-nio-443-exec-290] ERROR c.e.s.s.r.DeviceRepositoryImpl - Error in QueryPropertiesResponse for Device::-99401056648822783 property:RFCACHE_WRITES_SKIPPED_STUCK_IO has value type: UNDEFINED_PROP_TYPE
2022-08-05 12:05:53,982 [https-jsse-nio-443-exec-310] ERROR c.e.s.s.w.c.ScaleIOController - Got an exception in handleException
java.lang.IllegalStateException: Bad number: 3 <<<
at com.emc.s3g.scaleio.domain.enums.ScsiReserveType.valueOf(ScsiReserveType.java:42) ~[ams-1.0-SNAPSHOT.jar:na]
at com.emc.s3g.scaleio.repository.BaseRepository.updateStatistics(BaseRepository.java:1184) ~[repository-1.0-SNAPSHOT.jar:na]
at com.emc.s3g.scaleio.repository.BaseRepository.getStatistics(BaseRepository.java:981) ~[repository-1.0-SNAPSHOT.jar:na]
at com.emc.s3g.scaleio.web.controller.ScaleIOController.getStatistics(ScaleIOController.java:93) ~[classes/:na]
at sun.reflect.GeneratedMethodAccessor731.invoke(Unknown Source) ~[na:na]
工作系统示例:
root@working_cloudiq ~]# ls -lrt /opt/emc/scaleio/gateway/temp total 300 drwx------. 2 root root 25 Feb 28 2020 certificates drwx------. 2 root root 6 Feb 28 2020 scaleio-install-logs -rwx------. 1 root root 0 Feb 28 2020 216e5abe-29e9-4825-b095-d8900d5964d8_ScaleIO-config.json -rwx------. 1 root root 0 Jan 12 2022 safeToDelete.tmp -rwx------. 1 root root 521 Jan 12 2022 index.html -rwx------. 1 root root 0 Mar 20 01:36 GATEWAY_RUN_USER.txt -rw-r-----. 1 root root 95929 Jul 14 08:33 powerflex_1657787617941_ELMSIO1234568_config.zip -rw-r-----. 1 root root 47245 Jul 15 07:34 powerflex_1657870447081_ELMSIO1234568_capacity.zip -rw-r-----. 1 root root 95935 Jul 15 08:33 powerflex_1657874022010_ELMSIO1234568_config.zip -rw-r-----. 1 root root 47330 Jul 15 08:34 powerflex_1657874048125_ELMSIO1234568_capacity.zip -rw-r-----. 1 root root 2671 Jul 15 09:02 powerflex_1657875734080_ELMSIO1234568_alerts.zip -rw-r-----. 1 root root 2671 Jul 15 09:02 powerflex_1657875734085_ELMSIO1017KPF3_performance.zip <<< -rw-r-----. 1 root root 2670 Jul 15 09:07 powerflex_1657876034745_ELMSIO1234568_alerts.zip -rw-r-----. 1 root root 2670 Jul 15 09:07 powerflex_1657876034750_ELMSIO1017KPF3_performance.zip <<<
无法正常工作的系统示例 — 未生成performance.zip文件:
root@not_working_cloudiq ~]# ls -lrt /opt/emc/scaleio/gateway/temp total 300 drwx------. 2 root root 25 Feb 28 2020 certificates drwx------. 2 root root 6 Feb 28 2020 scaleio-install-logs -rwx------. 1 root root 0 Feb 28 2020 216e5abe-29e9-4825-b095-d8900d5964d8_ScaleIO-config.json -rwx------. 1 root root 0 Jan 12 2022 safeToDelete.tmp -rwx------. 1 root root 521 Jan 12 2022 index.html -rwx------. 1 root root 0 Mar 20 01:36 GATEWAY_RUN_USER.txt -rw-r-----. 1 root root 95929 Jul 14 08:33 powerflex_1657787617941_ELMSIO1234568_config.zip -rw-r-----. 1 root root 47245 Jul 15 07:34 powerflex_1657870447081_ELMSIO1234568_capacity.zip -rw-r-----. 1 root root 95935 Jul 15 08:33 powerflex_1657874022010_ELMSIO1234568_config.zip -rw-r-----. 1 root root 47330 Jul 15 08:34 powerflex_1657874048125_ELMSIO1234568_capacity.zip -rw-r-----. 1 root root 2671 Jul 15 09:02 powerflex_1657875734080_ELMSIO1234568_alerts.zip -rw-r-----. 1 root root 2670 Jul 15 09:07 powerflex_1657876034745_ELMSIO1234568_alerts.zip
原因
PowerFlex 支持 SCSI-2 保留和 SCSI-3 保留命令的子集。SCSI 保留命令(重置、保留、释放、读取)由 SDC 发送到 MDM,然后由 MDM 更新 SDS。
在卷上放置 SCSI-3 保留后,RestAPI 会从 GW 调用到 MDM 以读取卷统计信息,然后会失败,并显示上面提到的错误 - Bad number:3.
GW 误解了 SCSI 保留类型,使从 MDM 返回的 RestAPI 调用失败。
PowerFlex 端的 I/O 和保留按预期工作。
如何在get_info中验证 SCSI 保留信息?
$ awk 'BEGIN { printf "%-15s %-15s %s\n", "Volume_ID", "Volume_Name", "SCSI_Reservation"; printf "%-15s %-15s %s\n", "---------", "-----------", "----------------" }; /: ID:/ { volume_id = $2; volume_name = $3 } / SCSI-reserver-key:/ { scsi_reserv = $1; if (scsi_reserv == "scsi2-reserved:3"){ printf "%-15s %-15s %-15s %s\n", volume_id, volume_name, scsi_reserv, "<<< SCSI-3 !!!" } else{ printf "%-15s %-15s %s\n", volume_id, volume_name, scsi_reserv } }' getInfoDump/mdm/sdbg_out.txt | column -t
Volume_ID Volume_Name SCSI_Reservation
--------- ----------- ----------------
ID:0x2fad5f7f00000000 Name:vol1-sp1-PD1 scsi2-reserved:0
ID:0x2fad5fcb00000001 Name:vol2-sp1-PD1 scsi2-reserved:3 <<< SCSI-3 !!!
ID:0x2fad5fcc00000002 Name:vol3-sp1-PD1 scsi2-reserved:3 <<< SCSI-3 !!!
ID:0x2fa9dd3d00000003 Name:vol4-sp1-PD1 scsi2-reserved:0
如何在实时系统上验证 SCSI 保留信息?
$ cat > script
c mdm
dumpallscreens
disconnect
exit
^D
$ /opt/emc/scaleio/sds/diag/sdbg script > sdbg_out.txt
$ awk 'BEGIN { printf "%-15s %-15s %s\n", "Volume_ID", "Volume_Name", "SCSI_Reservation"; printf "%-15s %-15s %s\n", "---------", "-----------", "----------------" }; /: ID:/ { volume_id = $2; volume_name = $3 } / SCSI-reserver-key:/ { scsi_reserv = $1; if (scsi_reserv == "scsi2-reserved:3"){ printf "%-15s %-15s %-15s %s\n", volume_id, volume_name, scsi_reserv, "<<< SCSI-3 !!!" } else{ printf "%-15s %-15s %s\n", volume_id, volume_name, scsi_reserv } }' sdbg_out.txt | column -t
Volume_ID Volume_Name SCSI_Reservation
--------- ----------- ----------------
ID:0x2fae49da00000001 Name:vol1-sp1-PD1 scsi2-reserved:0
ID:0x2fad5fcb00000002 Name:vol2-sp1-PD1 scsi2-reserved:3 <<< SCSI-3 !!!
ID:0x2fad5fcc00000003 Name:vol3-sp1-PD1 scsi2-reserved:3 <<< SCSI-3 !!!
ID:0x2fa9dd3d00000004 Name:vol4-sp1-PD1 scsi2-reserved:0解决方案
由于 SCSI 预留由客户端和应用程序端设置,因此唯一的解决方法是从卷
中释放预留。受影响的版本
PowerFlex v3.5PowerFlex
v3.6PowerFlex
v4.0
已修复版本
PowerFlex v3.5.1.9
PowerFlex v3.6.1
PowerFlex v4.0.1.1
其他信息
收集和创建性能捆绑包文件的流程包含两个单独的过程:
该 第一个流程 已激活 每 5 秒 从 MDM 发送统计信息请求,并以累积方式存储响应。
该 第二道工序 已激活 每 5 分钟 计算增量并将数据压缩到 /opt/emc/scaleio/gateway/temp 目录中的.zip文件中。