PowerFlex enable_background_device_scanner, API, and log changes
Summary: This article describes the changes to the scli and REST API commands to enable the background scanner in VxFlexOS after the advent of the "conservative" mode feature in v3.0.0.2 HF1. This allows us to have better control over the reporting capabilities and to determine which modes the customer wants to be activated. In all previous versions (up to and including v3.0.1) the background scanner had two options, device_only mode and comparison_mode (which also included all functionality of device_only mode). ...
Instructions
As of ScaleIO v2.5 HF1 comparison mode included a new option to skip fixing the comparison errors and report on it into the /opt/emc/scaleio/logs/comparison.0 file. This also disabled the read repair and reporting functionality of the device_only mode that was part of the comparison mode in a normal state.
Along with the scli and API changes some additional logging has been added to the SDS nodes when background scanner comparison mode is enabled with v3.0.0.2 HF.
Changes Log Changes: /opt/emc/scaleio/sds/logs/comparison.x
19/11 10:50:12.446570 Scanning combs for device be4c66a400000001, loop #1945 ended, #scanned combs=137, #sec combs (skipped)=69, #primary combs skipped=0, #scanned teeth=19418
Steps to reduce the default number of comparisons.x files from 10 to a smaller number:
echo "tgt_comparison_file_history =2" >> /opt/emc/scaleio/sds/cfg/conf.txt ; pkill sds
Perform this on all SDSs where the files should be reduced. New CRC32 output log files (contains the hash output for the 1MB user data from either PRI or SEC) to validate between primary and secondary:
/opt/emc/scaleio/sds/logs/{randomID}_compare_combID_offset_[pri/sec].log
EXAMPLE:
/opt/emc/scaleio/sds/logs/1259357004_compare_71f2800001b5_18432_pri.log Trace entry correlating in comparison.x: 13/11 14:03:54.470982 vtreeId a227e28500000000 Comb 71f2800001b5, offset 18432 - compare error - succeeded to fix the secondary by the primary, randId 1259357004.
Corresponding sds/logs/trc.x logs:
trc.5:13/11 14:03:54.459658 0x7f598e35ddb8:mgStorageRegion_VerifyDataCRC:02769: primary and secondary checksums are different (pri=1025730551, sec=3956700141). will dump the error. randId 1259357004. combId:71f2800001b5,vTree:0xa227e28500000000,offsetVol:0x12da000,offsetInComb:18432,sizeInLbs:2048,phyToothIdx:2608,srcToothIdx:inv,dstToothIdx:inv New:(0,0) Requested:(3,1) volId:0 trc.5:13/11 14:03:54.470972 0x7f598e35ddb8:raidScan_FixSecondaryByPrimary:00295: comb:71f2800001b5,vTree:0xa227e28500000000,offsetVol:0x12da000,offsetTooth:0x0, Comb 71f2800001b5, offset 18432 - fix the secondary by the primary, randId 1259357004, rc = SUCCESS trc.5:13/11 14:03:54.470988 0x7f598e35ddb8:raidScan_HandleComparisonError:01006: compare error - succeeded to fix the secondary by the primary. randId 1259357004. combId:71f2800001b5,vTree:0xa227e28500000000,offsetVol:0x12da000,offsetInComb:18432,sizeInLbs:2048,phyToothIdx:2608,srcToothIdx:inv,dstToothIdx:inv New:(0,0) Requested:(3,1) volId:0 trc.5:13/11 14:03:54.472246 0x7f598e35ddb8:raidScan_HandleComparisonError:01046: Sent a message to the MDM on compare error (randId 1259357004). combId:71f2800001b5,vTree:0xa227e28500000000,offsetVol:0x12da000,offsetInComb:18432,sizeInLbs:2048,phyToothIdx:2608,srcToothIdx:inv,dstToothIdx:inv New:(0,0) Requested:(3,1) volId:0
Checking the two output files:
-rw-rw-rw-. 1 root root 4096 Nov 13 14:03 1259357004_compare_71f2800001b5_18432_pri.log -rw-r--r--. 1 root root 4096 Nov 19 11:10 1259357004_compare_71f2800001b5_18432_sec.log diff 1259357004_compare_71f2800001b5_18432_pri.log 1259357004_compare_71f2800001b5_18432_sec.log Binary files 1259357004_compare_71f2800001b5_18432_pri.log and 1259357004_compare_71f2800001b5_18432_sec.log differ [root@localhost logs]# md5sum 1259357004_compare_71f2800001b5_18432_pri.log d11924aadfe72dd1117f260f8b092caf 1259357004_compare_71f2800001b5_18432_pri.log [root@localhost logs]# md5sum 1259357004_compare_71f2800001b5_18432_sec.log 620f0b67a91f7f74151bc5be745b7110 1259357004_compare_71f2800001b5_18432_sec.log
Old Command
scli --enable_background_device_scanner
Error: Either a valid Storage Pool ID or name must be specified
Usage: scli --enable_background_device_scanner (((--protection_domain_id | --protection_domain_name ) --storage_pool_name ) | --storage_pool_id ) --scanner_mode [--scanner_bandwidth_limit ] [--report_and_fix | --report_only]
Description: Enable background device scanner
Parameters:
--protection_domain_id Protection Domain ID
--protection_domain_name Protection Domain name
--storage_pool_name Storage Pool name
--storage_pool_id Storage Pool ID
--scanner_mode Scanner mode of operation where MODE can be one of the following:
device_only - Perform read operations. Fix from peer on errors
data_comparison - Perform the device_only test and compare the data content with peer. Not supported for fine granularity Storage Pools
--scanner_bandwidth_limit Bandwidth limit in KB per second per device. The given value should be between 10KB and 10MB. The default value is 3MB
--report_and_fix Report errors and automatically fix them (default)
--report_only Report errors without fixing them
|
Scanner Mode |
Action |
Outcome |
|
device_only |
report_and_fix |
Only read errors are detected, reported, and fixed. |
|
report_only |
Only read errors are detected and reported (not fixed) |
|
|
Not specified |
Same as report_and_fix (option 1) |
|
|
data_comparison |
report_and_fix |
Both read and compare errors are detected and fixed. |
|
report_only |
Both read and compare errors are detected and reported (not fixed) |
|
|
Not specified |
Same as report_and_fix (option 3) |
New Command
Usage: scli --enable_background_device_scanner (((--protection_domain_id | --protection_domain_name ) --storage_pool_name ) | --storage_pool_id ) --read_error_action --compare_error_action [--scanner_bandwidth_limit ]
Description: Enable background device scanner
Parameters:
--protection_domain_id Protection Domain ID
--protection_domain_name Protection Domain name
--storage_pool_name Storage Pool name
--storage_pool_id Storage Pool ID
--read_error_action Read error handling. Can be one of the following:
report_and_fix - Perform read operations and automatically fix from peer on errors
report_only - Perform read operations and report errors without fixing them
--compare_error_action Compare error handling. Can be one of the following:
no_compare - Perform read operations without comparing data content with peer
report_and_fix - Perform read operations, compare data content with peer, and automatically fix
Not supported for fine granularity Storage Pools
report_only - Perform read operations, compare data content with peer, and report mismatches without fixing them
Not supported for fine granularity Storage Pools
--scanner_bandwidth_limit Bandwidth limit in KB per second per device. The given value should be between 10KB and 10MB. The default value is 3MB
|
Read Error Mode |
Compare Error Mode. |
Outcome |
Mapping |
|
|
no_scan |
no_scan |
Invalid combination - there is the disable command for this. |
Invalid 1 |
scannerMode = NULL (not supported) |
|
report_only |
Invalid combination - cannot compare without reading. |
Invalid 2 |
|
|
|
report_and_fix |
Invalid combination - cannot compare without reading. |
Invalid 3 |
|
|
|
report_only |
no_scan |
Only read errors are detected and reported (not fixed) |
2 |
scannerMode = device_only compareErrorAction = NULL readErrorAction = report_only
|
|
report_only |
Both read and compare errors are detected and reported (not fixed) |
4 |
scannerMode = data_comparison compareErrorAction = report_only readErrorAction = report_only
|
|
|
report_and_fix |
Both read and compare errors are detected and reported, but only compare errors are fixed. |
New 1 |
scannerMode = data_comparison compareErrorAction = report_and_fix readErrorAction = report_only
|
|
|
report_and_fix |
no_scan |
Only read errors are detected, reported, and fixed. |
1 |
scannerMode = device_only compareErrorAction = NULL readErrorAction = report_and_fix
|
|
report_only |
Both read and compare errors are detected and reported, but only read errors are fixed. |
New 2 |
scannerMode = data_comparison compareErrorAction = report_only readErrorAction = report_and_fix
|
|
|
report_and_fix |
Both read and compare errors are detected, reported, and fixed. |
3 |
scannerMode = data_comparison compareErrorAction = report_and_fix readErrorAction = report_and_fix
|