Unsolved
Community Manager
•
119 Posts
1
1651
Powerflex Background Scanner - Frequently asked Questions and Answers
1) How long does it take to start after storage pool creation?
30 seconds after the device is created and 30 seconds after SDS process restart.
The setting will be displayed in the GUI under Dashboard-> Configuration -> Storage Pools or by running the scli command.
2) How long does background scanner take for the scanner to restart, after it has completed one full scan? In Getting to Know Dell Technologies PowerFlex v3.5.x “Other functions” guide, it mentions that 'When a scan is completed, the process starts again, thus adding constant protection to your system.' how does that work?
3) Can background scanner be Enabled/Disabled for a given storage pool at any time?
Upon Enabling/Disabling the scanner and after SDS restart, it takes 30 seconds to start/stop. The scanner runs continuously in cycles/infinite loop. Every time the scanner starts, it starts from different location on the device.
It starts scanning from a random comb.
Once all the combs are scanned, it starts scanning again in more-or-less the same order (new combs are added to the tail of the list).
It’s an infinite loop. No pause between completions. Upon restart the scanner won’t continue from the same location.
It starts scanning from a random comb.
Once all the combs are scanned, it starts scanning again in more-or-less the same order (new combs are added to the tail of the list).
It’s an infinite loop. No pause between completions. Upon restart the scanner won’t continue from the same location.
3) Can background scanner be Enabled/Disabled for a given storage pool at any time?
If Background Scanner was not enabled on existing storage pools, new storage pools that get created will be enabled by default. Then you can enable/disable BG scanner for a storage pool at any given time, provided that Storage Pool settings (granularity, zero padding and persistent checksum) match the requested scanner mode.
4) Can you temporarily disable Background scanner?
5) Is it enabled by default?
Using the disable command, the options are enable or disable. This is no option to postpone or delay the operation.
5) Is it enabled by default?
After version 3.5; it is enabled by default if you create a new storage pool. Please refer to release notes for more information on this.
6a.) How would I check if the scanner is running through the commandline?
c.) Do scan results reflect in any particular log?
Use scli command --query_all to check if device scanner is enabled on each Storage pool. Please refer to “Dell EMC PowerFlex v3.6.x CLI Reference Guide” for more information on the scli command.
Example of relevant line:
Background device scanner: Enabled, Read Error Action: report and fix, Compare Error Action: report and fix, Bandwidth Limit 3072 KBps per device
Link to command explanation and output example: query_all
Example of relevant line:
Background device scanner: Enabled, Read Error Action: report and fix, Compare Error Action: report and fix, Bandwidth Limit 3072 KBps per device
Link to command explanation and output example: query_all
b.) Also, it was mentioned that the scanner reports to SNMP - is there a way to check this reporting history in the SNMP?
No option to check reporting history of SNMP
c.) Do scan results reflect in any particular log?
The scan errors are reflected in the MDM events and SDS logs of the relevant SDS.
To check for errors detected by the background device scanner, query SDSs using the --query_sds command.
The "--query_sds --sds_id " output will show a counter for each device with corrected read errors: e.g. Name: /dev/sdr Path: /dev/sdr Original-path: /dev/sdr ID: Storage Pool: SP1, Capacity: 1116 GB Error-fixes: 6 scanned 0 MB, Compare errors: 0 State: Normal
In addition, all issues will be reported to the events.log on the master MDM, the alerts tab in the GUI, and can be sent via SNMP. e.g. SCANNER_COMPARE_REPORT ERROR Background device scanner on device ID
Note: The "compare error - succeeded" message is not visible on show events command.
To check for errors detected by the background device scanner, query SDSs using the --query_sds command.
The "--query_sds --sds_id " output will show a counter for each device with corrected read errors: e.g. Name: /dev/sdr Path: /dev/sdr Original-path: /dev/sdr ID: Storage Pool: SP1, Capacity: 1116 GB Error-fixes: 6 scanned 0 MB, Compare errors: 0 State: Normal
In addition, all issues will be reported to the events.log on the master MDM, the alerts tab in the GUI, and can be sent via SNMP. e.g. SCANNER_COMPARE_REPORT ERROR Background device scanner on device ID
Note: The "compare error - succeeded" message is not visible on show events command.
7) How many scanner modes are there and how do you check which mode the storage pool is running?
Running scli query_all output command to view the two scanning modes.
Only one mode can be selected.
Only one mode can be selected.
- Device only -- Perform read operations. Fix from peer on errors.
- Data comparison -- Perform the device-only test, and compare the data content with peer. Zero padding must be enabled in order to set the background device scanner to data comparison mode.
scli --enable_background_device_scanner (((--protection_domain_id | --protection_domain_name ) --storage_pool_name ) | --storage_pool_id ) --scanner_mode {device_only | data_comparison} [--scanner_bandwidth_limit ]
In fine granularity, all storage pools are Zero padded so both scanning options are supported.
On Medium granularity, zero padding is “optional” and thus the following note is relevant:
“Perform the device-only test, and compare the data content with peer. Zero padding must be enabled; in order to set the background device scanner to data comparison mode”
Would it mean that the scanner is constantly running with no split second gap between its two cycles?
Upon Enabling/Disabling the scanner, takes few seconds to start/stop. The scanner runs continuously in cycles. Every time the scanner starts, it starts from different location on the device. Upon restart the scanner won’t continue from the same location. - Can be Enabled/Disabled for a given storage pool at any time - For new storage pool, the scanner is disabled - New device derives its’ configuration from the storage pool.
9) In case of 'device only mode' - it's mentioned that scanner uses the device's internal checksum mechanism to validate the primary and secondary data - may I know how the internal checksum mechanism work? And how does the scanner know which (between primary and secondary) is the faulty device, in case either is faulty? It's mentioned that the scanner attempts to correct the faulty device with the data from the good device. I am assuming that it's based on faulty error as it's mentioned that if a faulty area is read - but what would be the mechanism of this error?
Also, in 'device only mode' - it's mentioned that if the read fails on both devices, the scanner skips to the next storage block. Could you help me understand what this means and how the next storage block assists data recovery, if data recovery is done?
Also, in 'device only mode' - it's mentioned that if the read fails on both devices, the scanner skips to the next storage block. Could you help me understand what this means and how the next storage block assists data recovery, if data recovery is done?
Device Only Mode - The Scanner attempts to read 1MB chunk from both copies - If read succeeds, moves to the next chunk - If the read fails, the scanner attempts to fix it using the other copy - If the fix succeeds, moving to the next chunk - If the fix fails, moving to the next chunk relying on device error mechanism - If the read fails on both, moving to the next comb.
And if checksum is available it is used to verify the data read. If not available, then it tries to read. If checksum doesn’t match or read fails, the data is copied from the other copy. If there’s a mismatch or read error on both copies no fix is possible. It’s reported and skipped.
And if checksum is available it is used to verify the data read. If not available, then it tries to read. If checksum doesn’t match or read fails, the data is copied from the other copy. If there’s a mismatch or read error on both copies no fix is possible. It’s reported and skipped.
10) Regarding 'data comparison mode' - it's mentioned that the scanner calculates and compares their checksum - could you help me understand how this comparison is done? How would the scanner know which is the right data - whether primary or secondary, as it's mentioned that the scanner attempts to overwrite the secondary device with the data from the primary device - but what would happen in case primary device is faulty?
11) what is an example alert that happens with the compare error if the secondary mirror is potential faulty?
12) What kind of error is shown in case fixing the faulty chunk couldn't be done?
If persistent checksum is available and enabled, then we know which copy is inconsistent.
If it’s not available, then we assume the primary copy is correct. It’s the safest option for this case because user may have already read the data from the primary copy so it would be best to keep coherency.
If it’s not available, then we assume the primary copy is correct. It’s the safest option for this case because user may have already read the data from the primary copy so it would be best to keep coherency.
11) what is an example alert that happens with the compare error if the secondary mirror is potential faulty?
The SDS trc.logs will provide in depth information on whether the data was recovered. e.g. 774078 23a8fec8:raidScan_Start:00669: Comb 205c80158152, offset 11159552 - primary and secondary checksums are different (pri=3420629458, sec=1047435630) 774498 23a8fec8:raidScan_Start:00730: Comb 205c80158152, offset 11159552 - Sent a message to the MDM on compare error 780997 23a8fec8:raidScan_Start:00758: Comb 205c80158152, offset 11159552 - compare error - succeeded to fix the secondary by the primary
12) What kind of error is shown in case fixing the faulty chunk couldn't be done?
The counters show up in the SDS property sheet, under Background device scanner. The error should not be propagated to the application.
Example: Alerts id SIO02.04.0000007 Background scanner compare error.
Example: Alerts id SIO02.04.0000007 Background scanner compare error.
13) With regards to faulty chunk while in 'data comparison mode', how is the comparison performed?
14) Where are logs generated?
15) Is there an option to schedule background scanner?
Performs same operations as device only mode - If both reads succeeded, compares between two copies - If the copies are different, overwrites secondary with the primary - ScaleIO writes to both copies, but reads only from primary. See user Guide for more details.
14) Where are logs generated?
MDM logs /opt/emc/scaleio/mdm/bin/showevents.py
SDS /opt/emc/scaleio/sds/log/trc.x
MDM /opt/emc/scaleio/mdm/log/trc.x
events.txt
SDS /opt/emc/scaleio/sds/log/trc.x
MDM /opt/emc/scaleio/mdm/log/trc.x
events.txt
15) Is there an option to schedule background scanner?
Yes, there is a rest api where you could schedule via the gateway.