Avamar: HFScheck is running for longer than expected

Summary: HFScheck is running for longer than expected. This is due to Avamar checkpoint not being created on all storage nodes due to suspended disk.

Symptoms

Cause

Resolution

Affected Products

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Check out other resources

Symptoms

The "status.dpn" output shows that the checkpoint validation (aka hfscheck) is running for a long time.

The "avmaint hfscheckstatus" that is running as reduced:

avmaint hfscheckstatus

<hfscheckstatus
  nodes-queried="13"
  nodes-replied="13"
  nodes-total="12"
  checkpoint="cp.20190512172440"
  status="hfscheck"
  phase="datasweep"
  type="reduced"
  checks="rolling+metadata:10:2"
  elapsed-time="153057"
  start-time="1557682651"
  end-time="0"
  check-start-time="1557683842"
  check-end-time="0"
  generation-time="1557835708"
  stripes-checking="274039"
  stripes-completed="240254"
  offline-stripes="0"
  minutes-to-completion="1190"
  percent-complete="68.01"> 
 <hfscheckerrors/>
</hfscheckstatus>

One of the nodes is missing the hfscheck error log:
(This confirms which of the nodes has an issue)

With the keys loaded (See Avamar: How to Log in to an Avamar Server and Load Various Keys for information if required), run the following command:

mapall --noerror 'tail -4 /data01/hfscheck/err.log'

In this sample output, node 0.10 does not have the hfscheck/err.log:

Using /usr/local/avamar/var/probe.xml
(0.0) ssh -q  -x  -o GSSAPIAuthentication=no admin@192.168.255.2 'mapall --noerror 'tail -4 /data01/hfscheck/err.log''
2019/05/12-13:01:56.40927 {P0.0} [gsan]  <1306> sysconfig info: Valid NICs=8 NICs up=3
2019/05/12-13:01:56.40930 {P0.0} [gsan]  <1306> sysconfig info: valid NIC eth0 [speedMb=100, duplex=FULL] is not at maximum speed [speedMb=1000, duplex=FULL]
2019/05/12-13:01:56.48368 {P0.0} [gsan]  <1291> FIPS mode enabled
2019/05/12-13:02:02.76275 {0.0} [nodebeat:116]  <0016> node 0.1 was offline, changing
(0.1) ssh -q  -x  -o GSSAPIAuthentication=no admin@192.168.255.4 'mapall --noerror 'tail -3 /data01/hfscheck/err.log''
2019/05/12-13:01:56.42559 {P0.1} [gsan]  <1306> sysconfig info: Valid NICs=8 NICs up=3
2019/05/12-13:01:56.42568 {P0.1} [gsan]  <1306> sysconfig info: All NICs are at maximum speed [speedMb=1000, duplex=FULL]
2019/05/12-13:01:56.49923 {P0.1} [gsan]  <1291> FIPS mode enabled
2019/05/12-13:02:02.78169 {0.1} [nodebeat:116]  <0016> node 0.0 was offline, changing
...
(0.10) ssh -q  -x  -o GSSAPIAuthentication=no admin@192.168.255.12 'mapall --noerror 'tail -4 /data01/hfscheck/err.log''
tail: cannot open /data01/hfscheck/err.log: No such file or directory
(0.11) ssh -q  -x  -o GSSAPIAuthentication=no admin@192.168.255.13 'mapall --noerror 'tail -4 /data01/hfscheck/err.log''
2019/05/12-13:01:56.41444 {P0.2} [gsan]  <1306> sysconfig info: All NICs are at maximum speed [speedMb=1000, duplex=FULL]
2019/05/12-13:01:56.48873 {P0.2} [gsan]  <1291> FIPS mode enabled
2019/05/12-13:02:01.74107 {0.11} [nodebeat:107]  <0016> node 0.1 was offline, changing
2019/05/12-13:02:01.76281 {0.11} [nodebeat:107]  <0016> node 0.0 was offline, changing

There are warning (WARN) messages about suspended disks are reported in the GSAN log (/data01/cur/gsan.log*).

For example:

2019/05/12-17:27:13.29974 {0.A} [manage:3858]  WARN: <1084> changing disk 2 on node 0.A to suspended state

The checkpoint was not created on the affected storage node, and the checkpoint is reduced:
(From the GSAN logs)

2019/05/12-17:27:13.29974 {0.A} [manage:3858]  WARN: <1084> changing disk 2 on node 0.A to suspended state
2019/05/12-17:28:27.28818 {0.A} [manage:3148]  WARN: <1040> cannot create checkpoint cp.20190512172440 on node 0.A because a disk is suspended.

The checkpoint is reduced:

cplist --lscp

In this sample output, the checkpoint is on 12 of the 13 nodes:

cp.20190512172440 Sun May 12 13:24:40 2019   valid --- ---  nodes  12/13 stripes 342457

Cause

The suspended disk on the storage node caused the checkpoint to be incomplete (12/13 nodes), therefore the checkpoint validation (hfscheck) is reduced, and takes longer to complete.

Resolution

1. Verify that all disks on the affected storage node are online before proceeding.

2. Verify that all stripes are online per Avamar: Suspended Partitions, Stripes, and Hfscheck Failures on Avamar

3. Terminate the running hfshceck:

avmaint hfscheckstop --ava

4. Put the grid into a controlled state per Avamar: How to Set the Avamar Server into a Known Controlled State)

5. Create a checkpoint:

avmaint checkpoint --ava --wait

(The prompt returns once the checkpoint has been completed.)

6. Verify that the checkpoint created successfully by using one of the following commands:

status.dpn |grep "Last checkpoint"

Last checkpoint: cp.20190514140904 finished Tue May 14 07:09:26 2019 after 00m 22s (OK)

-- Or --

avmaint cpstatus

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<cpstatus
  generation-time="1777352670"
  tag="cp.20190514140904"
  status="completed"
  stripes-completed="393"
  stripes-total="393"
  start-time="1777298944"
  end-time="1777298966"
  result="OK"
  refcount="13"/>

7. Verify that the checkpoint is not reduced:

cplist --lscp

In this sample output, the checkpoint is on all 13 nodes:

cp.20190512172440 Tue May 14 13:24:40 2019   valid --- ---  nodes  13/13 stripes 342457

8. Run an hfshceck on the newly created checkpoint:

avmaint hfscheck --ava

(The prompt only returns once the initial phase of the hfshceck has completed.)

9. Monitor the hfshceck to completion:

watch -n 60 avmaint hfscheckstatus

10. Acknowledge any data integrity alerts:

mccli event clear-data-integrity-alerts --reset-code=AVAMARDATAOK

11. Return the grid to production state (See Avamar: How to Set the Avamar Server into a Known Controlled State).

Affected Products

Avamar, Avamar Server

Article Number: 000039842

Article Type: Solution

Last Modified: 28 أبريل 2026

Version: 4

Check if your device is covered by Support Services.

Avamar: HFScheck is running for longer than expected

Summary: HFScheck is running for longer than expected. This is due to Avamar checkpoint not being created on all storage nodes due to suspended disk.

Symptoms

Cause

Resolution

Affected Products

Symptoms

Cause

Resolution

Affected Products

Article Properties

Find answers to your questions from other Dell users

Support Services

Article Properties

Find answers to your questions from other Dell users

Support Services

Avamar: HFScheck is running for longer than expected

Summary: HFScheck is running for longer than expected. This is due to Avamar checkpoint not being created on all storage nodes due to suspended disk.

Detailed Article

Symptoms

Cause

Resolution

Affected Products

Symptoms

Cause

Resolution

Affected Products

Article Properties

Find answers to your questions from other Dell users

Support Services

Article Properties

Find answers to your questions from other Dell users

Support Services