Unsolved
This post is more than 5 years old
2 Intern
•
308 Posts
0
12359
Talking about Avamar monitor
Talking about Avamar monitor
This article will show you some simple methods which can be used to monitor Avamar conditions in real time.
Detailed Information
Required equipment:
- Avamar Gen4s-M1200, version 7.0.2, including three storage nodes, hanging DD860;
- Avamar Gen4-3.9TB, version 6.0.2, including three storage nodes.
1. Use Avamar GUI monitoring system
We can use Avamar GUI to monitor the status of Avamar, as shown in Figures 1 and 2. Figure 1 is Avamar Gen4s-M1200, version 7.0.2, Figure 2 is Avamar Gen4-3.9TB, 6.0.2.
Figure 1
Figure 2
Click “All Failures” on the figure 1, will get the all failure of Avamar backup as shown in figure 3.
Figure 3
Clicking “Critical Events” in the figure 1, will get the other Avamar activities as shown in figure 4. Such as CheckPoint, HFScheck, Garbage Collection, and hardware related errors.
Figure 4
Here are some common errors and solutions:
1) A CheckPoint of server data is overdue.
It means there is no any new checkpoint created within 24 hours. In general, we need to see whether the CheckPoint is completed within a few hours. If yes, this error can be ignored. Because, Avamar's daily workflow is
Backup -> Garbage Collection -> CheckPoint -> HFScheck -> CheckPoint -> Backup
It can be seen, that each work is carried out sequentially. We also have a pre-set time for each part of the work window, but there always be some point of the work that is not necessarily completed on time. If Avamar’s work is delayed due to some special circumstances, this will lead to CheckPoint not being generate within 24 hours. Often, CheckPoint will be completed within the next 2-3 hours. So, this error can be ignored.
2) Data Integrity Alerts
This error is about HFScheck. We often encounter some errors like MSG_ERR_HFSCHECKERRORS、MSG_ERR_DDR_ERROR、MSG_ERR_CGSAN_FAILED、
MSG_ERR_TIMEOUT。
- MSG_ERR_HFSCHECKERRORS is caused by stripes problem on the storage node. Details can be found in KB 127269 (https://support.emc.com/kb/127269). We can implement solutions according to the detailed error on GSAN error log (/ data01 / cur / err. log), HFScheck error log (/data01/hfscheck/err.log), CheckPoint log (/ data01 / checklogs / cp.xxxxxxxxxxxxxx / err.log).
- MSG_ERR_DDR_ERROR is caused by DD connection problems. Details can be found in KB 120996 (https://support.emc.com/kb/120996). Implement a resolution according to the above-mentioned HFScheck error analysis and DDR log (/ usr / local / avamar / var / ddrmaintlogs / ddrmaint.log).
- MSG_ERR_CGSAN_FAILED is caused by GSAN process issues. Details can be found in KB 165409 (https://support.emc.com/kb/165409). This involves hardware, ASCD process, and Time synchronization between each node, if licenses are properly configured, whether there is an RMCP process on Gen3.3 etc issues. These issues will result in HFScheck and display MSG_ERR_CGSAN_FAILED error.
- MSG_ERR_TIMEOUT, it similar with the MSG_ERR_CGSAN_FAILED error, it is caused by hardware issues, if there is an RMCP process on Gen3.3, thread exhaustion on single node etc issues . Details can be found in KB 172518 (https://support.emc.com/kb/172518)
For the above Data Integrity Alerts, either automatically resolved by server or technical support engineers, these alerts are available to clear through the following actions:
1) Clear by command line:
mccli event clear-data-integrity-alerts --reset-code = AVAMARDATAOK
2) Clear by GUI
a. Login to GUI, click Administration
b. Click the Event Management
c. Click Unacknowledged Events
d. Click Actions> Event Management> Clear Data Integrity Alert
e. Enter the code : AVAMARDATAOK
Well, we have introduced how to use the GUI to monitor Avamar Server, especially maintenance jobs. Through this article, you should know how to use the GUI to monitor Avamar maintenance and be familiar with some of the error messages
BenPin
1 Message
0
April 17th, 2021 03:00
MSG_ERR_DDR_ERROR kb article leads to a wrong link