Avamar: Gen4T Hardware: Symptom Code: 52764 - Bad Block discovered
Summary: This knowledge article references the media errors on the physical disks in the Avamar Data Store Gen4T nodes, and how to address them.
Symptoms
The following errors are seen in the MC UI, AUI, and /var/log/messages file on the node:
Symptom Code: 52764, Desc: Adaptec Event Monitor: [13016] :WRN: Bad Block discovered: controller: Cause
A bad block was detected on a physical disk.
This is a medium error and depending on the number of errors, the physical disk may require replacement.
While the bad block may be fixed, this is a warning that additional bad blocks will be found in the future.
Resolution
1. Log in to the Avamar server using a putty session and load the admin keys. See Avamar: How to Log in to an Avamar Server and Load Various Keys for instructions on loading keys.
a. Using the information from the MC UI, AUI event, or the DialHome Service Request, determine the node that produced the error message.
b. Connect to the node:
ssn 0.#
(Where 0.# is the physical node number).
2. Review the /var/log/messages file on the node producing the errors:
grep -i "Bad Block" /var/log/messages
See APPENDIX A for sample output.
3. Confirm the status of the disks:
a. For the complete output showing of all physical disks attributes:
arcconf getconfig 1 pd
See APPENDIX B for sample output.
-- Or --
b. For a condensed look at the status of the disk and run time errors: See APPENDIX C for sample output.
arcconf getconfig 1 pd | grep -E 'Device|^[[:space:]]+State|Error' | grep -v 'Device Phy Information'
See APPENDIX C for sample output.
- If the disk state is reported as Failed, Critical, or Missing: Replace the physical disk
- If the disk state is reported as Online, and the Medium Error Count is below 100: No action is necessary
- If the disk state is reported as Online, and the Medium Error Count is between 100 and 200: There are potential signs of deterioration - closely monitor
- If the disk state is reported ad Online, and the Medium Error Count is 200 or more: Replace the physical disk
If a disk replacement is necessary, contact Dell Avamar Support.
Additional Information
APPENDIX A:
Sample output of: grep -i "bad block" /var/log/messages:
Sep 17 07:11:10 Adaptec Event Monitor: [13016] :WRN: Bad Block discovered: controller: 1 ( PM8060-RAID #FFFFFF00 Physical Slot: 0 ), channel: 0, deviceID: 14, enclosure ID: 0, slot ID: 6,
Sep 17 07:11:12 Adaptec Event Monitor: [13020] :INF: Bad Block repaired: controller: 1 ( PM8060-RAID #FFFFFF00 Physical Slot: 0 ), channel: 0, deviceID: 14, enclosure ID: 0, slot ID: 6.
Sep 17 07:11:16 Adaptec Event Monitor: [13016] :WRN: Bad Block discovered: controller: 1 ( PM8060-RAID #FFFFFF00 Physical Slot: 0 ), channel: 0, deviceID: 14, enclosure ID: 0, slot ID: 6,
Sep 17 07:11:17 Adaptec Event Monitor: [13020] :INF: Bad Block repaired: controller: 1 ( PM8060-RAID #FFFFFF00 Physical Slot: 0 ), channel: 0, deviceID: 14, enclosure ID: 0, slot ID: 6.
APPENDIX B:
Sample output of: arcconf getconfig 1 pd
Controllers found: 1
--------------------------------------------------------------------
Physical Device information
----------------------------------------------------------------------
Device #0
Device is a Hard drive
State : Online
Block Size : Unknown.
Programmed Max Speed : SAS 12.0 Gb/s
Transfer Speed : SAS 12.0 Gb/s
Reported Channel,Device(T:L) : 0,8(8:0)
Reported Location : Enclosure 0, Slot 0(Connector 0, Connector 1)
Reported ESD(T:L) : 2,0(0:0)
Vendor : HITACHI
Model : HUS72602CLAR2000
Firmware : N9C0
Serial number : K5H66BRA
World-wide name : 5000CCA25E43A6B3
Reserved Size : 3163160 KB
Used Size : 1904640 MB
Unused Size : 64 KB
Total Size : 1907729 MB
Write Cache : Disabled (write-through)
FRU : None
S.M.A.R.T. : No
S.M.A.R.T. warnings : 0
Power State : Full rpm
Supported Power States : Full rpm,Powered off
SSD : No
Temperature : 27 C/ 80 F
----------------------------------------------------------------
Device Phy Information
----------------------------------------------------------------
Phy #0
PHY Identifier : 0
SAS Address : 5000CCA25E43A6B1
Attached PHY Identifier : 17
Attached SAS Address : 50060481618A167F
Phy #1
PHY Identifier : 1
SAS Address : 5000CCA25E43A6B2
----------------------------------------------------------------
Runtime Error Counters
---------------------------------------------------------------
Hardware Error Count : 0
Medium Error Count : 102
Parity Error Count : 0
Link Failure Count : 0
Aborted Command Count : 0
SMART Warning Count : 0
APPENDIX C:
Sample output of: arcconf getconfig 1 pd | grep -E 'Device|^[[:space:]]+State|Error' | grep -v 'Device Phy Information'
Device #5
Device is a Hard drive
State : Online
Reported Channel,Device(T:L) : 0,13(13:0)
Runtime Error Counters
Hardware Error Count : 0
Medium Error Count : 11
Parity Error Count : 0