PowerFlex: A performance bundle for CloudIQ is not being created
Summary: Gateway (GW) is not generating a performance bundle for CloudIQ, but Configuration, Capacity, and Alerts are generated as expected.
Symptoms
System configuration, system capacity, system alerts, and system performance statistics are generated by GW when a system is configured to send CloudIQ characteristics and statistical data.
The creation of any of the four bundles cannot be stopped manually by default.
The GW is unable to collect volume statistics and generate the performance bundle due to an issue with the interpretation of RestAPI responses between it and MDM.
See the Additional Info section for more information regarding the flow of data collection and generation.
The following exception errors can be found in the GW scaleio-trace.log files:
2022-08-05 12:05:53,975 [AsyncHandler-21] ERROR c.e.scaleio.esrsmanager.EsrsManager - Alert hasn't been sent since ESRS reached limit of 200 per: 8 hours
com.emc.scaleio.esrsmanager.NotificationMessageLimitException: null <<<
at com.emc.scaleio.esrsmanager.ESRSConnector.sendConnectEmcMessage(ESRSConnector.java:360) ~[ams-1.0-SNAPSHOT.jar:na]
at com.emc.scaleio.esrsmanager.ESRSConnector.sendConnectEmcMessage(ESRSConnector.java:308) ~[ams-1.0-SNAPSHOT.jar:na]
at com.emc.scaleio.esrsmanager.EsrsManager.sendAlert(EsrsManager.java:566) [ams-1.0-SNAPSHOT.jar:na]
at com.emc.scaleio.esrsmanager.EsrsManager.sendAlert(EsrsManager.java:598) [ams-1.0-SNAPSHOT.jar:na]
at com.emc.scaleio.esrsmanager.BaseNotificationManager.busReceivedAlerts(BaseNotificationManager.java:103) [ams-1.0-SNAPSHOT.jar:na]
...
2022-08-05 12:05:53,975 [https-jsse-nio-443-exec-293] ERROR c.e.s.s.r.DeviceRepositoryImpl - Error in QueryPropertiesResponse for Device::-101652817808130048 property:PENDING_MOVING_OUT_FWD_REBUILD_JOBS has value type: UNDEFINED_PROP_TYPE
2022-08-05 12:05:53,975 [https-jsse-nio-443-exec-293] ERROR c.e.s.s.r.DeviceRepositoryImpl - Error in QueryPropertiesResponse for Device::-101652817808130048 property:NET_THIN_USER_DATA_CAPACITY_IN_KB has value type: UNDEFINED_PROP_TYPE
...
2022-08-05 12:05:53,982 [https-jsse-nio-443-exec-306] ERROR c.e.s.s.r.DeviceRepositoryImpl - Error in QueryPropertiesResponse for Device::-99119590261981183 property:PENDING_MOVING_OUT_FWD_REBUILD_JOBS has value type: UNDEFINED_PROP_TYPE
2022-08-05 12:05:53,982 [https-jsse-nio-443-exec-290] ERROR c.e.s.s.r.DeviceRepositoryImpl - Error in QueryPropertiesResponse for Device::-99401056648822783 property:RFCACHE_WRITES_SKIPPED_STUCK_IO has value type: UNDEFINED_PROP_TYPE
2022-08-05 12:05:53,982 [https-jsse-nio-443-exec-310] ERROR c.e.s.s.w.c.ScaleIOController - Got an exception in handleException
java.lang.IllegalStateException: Bad number: 3 <<<
at com.emc.s3g.scaleio.domain.enums.ScsiReserveType.valueOf(ScsiReserveType.java:42) ~[ams-1.0-SNAPSHOT.jar:na]
at com.emc.s3g.scaleio.repository.BaseRepository.updateStatistics(BaseRepository.java:1184) ~[repository-1.0-SNAPSHOT.jar:na]
at com.emc.s3g.scaleio.repository.BaseRepository.getStatistics(BaseRepository.java:981) ~[repository-1.0-SNAPSHOT.jar:na]
at com.emc.s3g.scaleio.web.controller.ScaleIOController.getStatistics(ScaleIOController.java:93) ~[classes/:na]
at sun.reflect.GeneratedMethodAccessor731.invoke(Unknown Source) ~[na:na]
Example of a working system:
root@working_cloudiq ~]# ls -lrt /opt/emc/scaleio/gateway/temp total 300 drwx------. 2 root root 25 Feb 28 2020 certificates drwx------. 2 root root 6 Feb 28 2020 scaleio-install-logs -rwx------. 1 root root 0 Feb 28 2020 216e5abe-29e9-4825-b095-d8900d5964d8_ScaleIO-config.json -rwx------. 1 root root 0 Jan 12 2022 safeToDelete.tmp -rwx------. 1 root root 521 Jan 12 2022 index.html -rwx------. 1 root root 0 Mar 20 01:36 GATEWAY_RUN_USER.txt -rw-r-----. 1 root root 95929 Jul 14 08:33 powerflex_1657787617941_ELMSIO1234568_config.zip -rw-r-----. 1 root root 47245 Jul 15 07:34 powerflex_1657870447081_ELMSIO1234568_capacity.zip -rw-r-----. 1 root root 95935 Jul 15 08:33 powerflex_1657874022010_ELMSIO1234568_config.zip -rw-r-----. 1 root root 47330 Jul 15 08:34 powerflex_1657874048125_ELMSIO1234568_capacity.zip -rw-r-----. 1 root root 2671 Jul 15 09:02 powerflex_1657875734080_ELMSIO1234568_alerts.zip -rw-r-----. 1 root root 2671 Jul 15 09:02 powerflex_1657875734085_ELMSIO1017KPF3_performance.zip <<< -rw-r-----. 1 root root 2670 Jul 15 09:07 powerflex_1657876034745_ELMSIO1234568_alerts.zip -rw-r-----. 1 root root 2670 Jul 15 09:07 powerflex_1657876034750_ELMSIO1017KPF3_performance.zip <<<
Example of a non-working system - performance.zip file was not generated:
root@not_working_cloudiq ~]# ls -lrt /opt/emc/scaleio/gateway/temp total 300 drwx------. 2 root root 25 Feb 28 2020 certificates drwx------. 2 root root 6 Feb 28 2020 scaleio-install-logs -rwx------. 1 root root 0 Feb 28 2020 216e5abe-29e9-4825-b095-d8900d5964d8_ScaleIO-config.json -rwx------. 1 root root 0 Jan 12 2022 safeToDelete.tmp -rwx------. 1 root root 521 Jan 12 2022 index.html -rwx------. 1 root root 0 Mar 20 01:36 GATEWAY_RUN_USER.txt -rw-r-----. 1 root root 95929 Jul 14 08:33 powerflex_1657787617941_ELMSIO1234568_config.zip -rw-r-----. 1 root root 47245 Jul 15 07:34 powerflex_1657870447081_ELMSIO1234568_capacity.zip -rw-r-----. 1 root root 95935 Jul 15 08:33 powerflex_1657874022010_ELMSIO1234568_config.zip -rw-r-----. 1 root root 47330 Jul 15 08:34 powerflex_1657874048125_ELMSIO1234568_capacity.zip -rw-r-----. 1 root root 2671 Jul 15 09:02 powerflex_1657875734080_ELMSIO1234568_alerts.zip -rw-r-----. 1 root root 2670 Jul 15 09:07 powerflex_1657876034745_ELMSIO1234568_alerts.zip
Cause
PowerFlex supports SCSI-2 reservation and a subset of SCSI-3 reservation commands. SCSI reservation commands (reset, reserve, release, read) are sent by SDCs to MDM, which then updates the SDSs.
When an SCSI-3 reservation has been placed on a volume, the RestAPI calls from the GW to the MDM to read the volume statistics and then fails with the error mentioned above - Bad number: 3.
The GW misinterprets the SCSI reservation type and fails the RestAPI call returning from the MDM.
The I/O and reservation on the PowerFlex side are working as expected.
How to validate SCSI reservation info in get_info?
$ awk 'BEGIN { printf "%-15s %-15s %s\n", "Volume_ID", "Volume_Name", "SCSI_Reservation"; printf "%-15s %-15s %s\n", "---------", "-----------", "----------------" }; /: ID:/ { volume_id = $2; volume_name = $3 } / SCSI-reserver-key:/ { scsi_reserv = $1; if (scsi_reserv == "scsi2-reserved:3"){ printf "%-15s %-15s %-15s %s\n", volume_id, volume_name, scsi_reserv, "<<< SCSI-3 !!!" } else{ printf "%-15s %-15s %s\n", volume_id, volume_name, scsi_reserv } }' getInfoDump/mdm/sdbg_out.txt | column -t
Volume_ID Volume_Name SCSI_Reservation
--------- ----------- ----------------
ID:0x2fad5f7f00000000 Name:vol1-sp1-PD1 scsi2-reserved:0
ID:0x2fad5fcb00000001 Name:vol2-sp1-PD1 scsi2-reserved:3 <<< SCSI-3 !!!
ID:0x2fad5fcc00000002 Name:vol3-sp1-PD1 scsi2-reserved:3 <<< SCSI-3 !!!
ID:0x2fa9dd3d00000003 Name:vol4-sp1-PD1 scsi2-reserved:0
How to validate SCSI reservation info on a live system?
$ cat > script
c mdm
dumpallscreens
disconnect
exit
^D
$ /opt/emc/scaleio/sds/diag/sdbg script > sdbg_out.txt
$ awk 'BEGIN { printf "%-15s %-15s %s\n", "Volume_ID", "Volume_Name", "SCSI_Reservation"; printf "%-15s %-15s %s\n", "---------", "-----------", "----------------" }; /: ID:/ { volume_id = $2; volume_name = $3 } / SCSI-reserver-key:/ { scsi_reserv = $1; if (scsi_reserv == "scsi2-reserved:3"){ printf "%-15s %-15s %-15s %s\n", volume_id, volume_name, scsi_reserv, "<<< SCSI-3 !!!" } else{ printf "%-15s %-15s %s\n", volume_id, volume_name, scsi_reserv } }' sdbg_out.txt | column -t
Volume_ID Volume_Name SCSI_Reservation
--------- ----------- ----------------
ID:0x2fae49da00000001 Name:vol1-sp1-PD1 scsi2-reserved:0
ID:0x2fad5fcb00000002 Name:vol2-sp1-PD1 scsi2-reserved:3 <<< SCSI-3 !!!
ID:0x2fad5fcc00000003 Name:vol3-sp1-PD1 scsi2-reserved:3 <<< SCSI-3 !!!
ID:0x2fa9dd3d00000004 Name:vol4-sp1-PD1 scsi2-reserved:0Resolution
As the SCSI reservation is set by the client and application side, the only workaround is to release the reservation from the volume.
Impacted Versions
PowerFlex v3.5
PowerFlex v3.6
PowerFlex v4.0
Fixed In Version
PowerFlex v3.5.1.9
PowerFlex v3.6.1
PowerFlex v4.0.1.1
Additional Information
The flow for gathering and creating the performance bundle file consists of two separate processes:
The First process Is activated Every 5 s And sends a request for statistics from the MDM, and stores the response in an accumulated manner.
The second process Is activated Every 5 minutes Where it calculates the deltas and compresses the data into a .zip file inside the /opt/emc/scaleio/gateway/temp directory.