Unsolved
This post is more than 5 years old
21 Posts
0
13034
1 Stripe Error
Hello everybody,
When I checked the Server Informations, I saw the following picture. I checked the status and I saw the checkpoint was not be created. Also hfschek could not be completed. one of the stripe has a problem.
How can I fix the stripe?
Thanks a lot.
rpervan
266 Posts
0
August 15th, 2011 01:00
Another thing is to involve "avmaint testintegrity --ava" to check particular stripe consistency .
rpervan
266 Posts
0
August 15th, 2011 01:00
I would suggest you to perform FULL HFS data consistensy check on latest CP . You can do this from GUI .
It will perform data integrity check on Avamar stripes and Avamar should repair it automaticaly ....
Otherweise , please open SR with EMC Avamar team and they will be glad to help you .
regards,
.r
rpervan
266 Posts
0
August 15th, 2011 05:00
hello,
could you supply us outputs:
# dpnctl status
# cplist
switch to "admin" account , load the keys ... and
# mapall 'ps -ef | grep gsan'
Please check the gsan log on all SN with commands ... it must be something relevant in gsan.log output .
# mapall 'grep ERR /data01/cur/gsan.log'
1 stripes OFFLINE is not let's say so critilal (up till 8 we can deal with ) but if you facing with such things for the first time aybe you should contact EMC ...
In worest case you can rool back to latest validated check point ...
Cheers,
.r
faltindal
21 Posts
0
August 15th, 2011 05:00
Hi Rej,
When I look to status of avamar, I see hfscheck process is still terminating since Friday. When try to create new check point it is not possible to create a new one.
You can see the result of avmain testintegrity at the following.
root@origin1:~/#: avmaint testintegrity 0.2-B2A --ava
root@origin1:~/#:
Here is the status.dpn result;
root@origin1:~/#: status.dpn
Mon Aug 15 15:37:20 EEST 2011 Mon Aug 15 12:37:19 2011 UTC (Initialized Tue Jan 25 12:15:25 2011 UTC)
Node IP Address Version State Runlevel SrvrRootUser Dis Suspend Load UsedMB Errlen %Full Percent Full and Stripe Status by Disk
0.0 10.83.55.163 5.0.3-29 ONLINE fullaccess mhpu0hpu0hpu 5 false 2.37 16657472 1166380 34.3% 35%(onl:1301) 34%(onl:1314) 34%(onl:1326)
0.1 10.83.55.164 5.0.3-29 ONLINE fullaccess mhpu0hpu0hpu 4 false 5.60 16572136 1817362 34.4% 35%(onl:1315) 34%(onl:1316) 34%(onl:1311)
0.2 10.83.55.165 5.0.3-29 ONLINE fullaccess mhpu0hpu0hpu 2 false 0.39 18102840 1033958 34.1% 34%(onl:1304) 33%(onl:1330,ERR:1) 33%(onl:1301)
SrvrRootUser Modes = migrate + hfswriteable + persistwriteable + useraccntwriteable
All reported states=(ONLINE), runlevels=(fullaccess), modes=(mhpu0hpu0hpu)
System-Status: ok
Access-Status: full
ERROR 1 stripes OFFLINE_MEDIA_ERROR
Checkpoint failed with result MSG_ERR_OFFLINE : cp.20110815123632 started Mon Aug 15 15:36:32 2011 ended Mon Aug 15 15:37:12 2011, completed 1009 of 11819 stripes
Last GC: finished Fri Aug 12 08:14:53 2011 after 02m 53s >> recovered 5.07 MB (OK)
Hfscheck in progress: started Fri Aug 12 17:05:26 2011 (terminating)
Maintenance windows scheduler capacity profile is active.
The maintenance window is currently running.
Next backup window start time: Mon Aug 15 23:00:00 2011 EEST
Next blackout window start time: Tue Aug 16 08:00:00 2011 EEST
Next maintenance window start time: Tue Aug 16 11:00:00 2011 EEST
Do you know what should I do?
Thanks
rpervan
266 Posts
0
August 15th, 2011 06:00
thanks for the update .
Yes, latest succesfuly CP was made on 14.06.2011 ! Strange! It should 2 CP per day created ...
OK, gsan is in fullaccess and that is fine .
Please switch to "admin" account. load the keys and create CP with:
# cp_cron --duplog
then use MCS and perform HFS "FULL" on this CP ...
or you can perform both operation from GUI if you want .
Please try with this one and let us know status update .
faltindal
21 Posts
0
August 15th, 2011 06:00
Hi Rej,
Here is the results;
root@origin1:~/#: dpnctl status
Identity added: /home/dpn/.ssh/dpnid (/home/dpn/.ssh/dpnid)
dpnctl: INFO: gsan status: ready
dpnctl: INFO: MCS status: up.
dpnctl: INFO: EMS status: up.
dpnctl: INFO: Backup scheduler status: up.
dpnctl: INFO: dtlt status: up.
dpnctl: INFO: axionfs status: up.
dpnctl: INFO: Maintenance windows scheduler status: enabled.
dpnctl: INFO: Maintenance cron jobs status: enabled.
dpnctl: INFO: Unattended startup status: disabled.
root@origin1:~/#:
root@origin1:~/#: cplist
cp.20110614050327 Tue Jun 14 08:03:27 2011 valid rol --- nodes 3/3 stripes 9518
root@origin1:~/#:
admin@origin1:~/>: mapall 'ps -ef | grep gsan'
Using /usr/local/avamar/var/probe.xml
(0.0) ssh -x admin@10.83.55.163 'ps -ef | grep gsan'
admin 5801 5800 0 13:02 ? 00:00:00 bash -c ps -ef | grep gsan
admin 5817 5801 0 13:02 ? 00:00:00 grep gsan
admin 29274 1 0 Mar29 ? 00:00:00 ./gsan restart --runlevel=fullaccess --ramfsroot= --clientssl=false --altlogdir= --mainhost=10.83.55.163 --mainport=20000 --gatewayaddr=10.83.55.161
admin 29275 29274 3 Mar29 ? 4-11:40:31 ./gsan restart --runlevel=fullaccess --ramfsroot= --clientssl=false --altlogdir= --mainhost=10.83.55.163 --mainport=20000 --gatewayaddr=10.83.55.161
(0.1) ssh -x admin@10.83.55.164 'ps -ef | grep gsan'
admin 17575 1 0 Mar29 ? 00:00:00 ./gsan restart --runlevel=fullaccess --ramfsroot= --clientssl=false --altlogdir= --mainhost=10.83.55.163 --mainport=20000 --gatewayaddr=10.83.55.161
admin 17576 17575 3 Mar29 ? 4-09:38:49 ./gsan restart --runlevel=fullaccess --ramfsroot= --clientssl=false --altlogdir= --mainhost=10.83.55.163 --mainport=20000 --gatewayaddr=10.83.55.161
admin 24433 24432 0 13:02 ? 00:00:00 bash -c ps -ef | grep gsan
admin 24449 24433 0 13:02 ? 00:00:00 grep gsan
(0.2) ssh -x admin@10.83.55.165 'ps -ef | grep gsan'
admin 16905 1 0 Mar29 ? 00:00:00 ./gsan restart --runlevel=fullaccess --ramfsroot= --clientssl=false --altlogdir= --mainhost=10.83.55.163 --mainport=20000 --gatewayaddr=10.83.55.161
admin 16906 16905 3 Mar29 ? 4-04:56:17 ./gsan restart --runlevel=fullaccess --ramfsroot= --clientssl=false --altlogdir= --mainhost=10.83.55.163 --mainport=20000 --gatewayaddr=10.83.55.161
admin 23780 23779 0 13:02 ? 00:00:00 bash -c ps -ef | grep gsan
admin 23796 23780 0 13:02 ? 00:00:00 grep gsan
admin@origin1:~/>:
You can find the result of #mapall 'grep ERR /data01/cur/gsan.log' command at the attachment log file.
I thought turn back to checkpoint but the latest succesfuly cp date is 06.14.2011. So if it is possible, I do not prefer to turn CP.
Thanks a lot
rpervan
266 Posts
0
August 17th, 2011 00:00
> Yesterday morning, second storage node has been offline.
> We can ping the node but gsan doesn't work. When I look to status.dpn I see the hfscheck process still terminating since friday.
You can try with "restart.dpn --nodes=0.X " to restart offline SN … and to sync it back …
> Hfscheck_kill command could not stop the process. Actualy I couldn't understand clearly what is happening on the system
Probably some process hang and it is hard decide what exactly without remote assistance .
Please open SR with EMC team !
rgds,
.r
faltindal
21 Posts
0
August 17th, 2011 00:00
Hi Rej
I tried the commands but I couldn't be successful. Yesterday morning, second storage node has been offline. We can ping the node but gsan doesn't work. When I look to status.dpn I see the hfscheck process still terminating since friday. Hfscheck_kill command could not stop the process. Actualy I couldn't understand clearly what is happening on the system
Today, I will create a service request.
Thanks for your help
nielecn
9 Posts
0
October 22nd, 2015 21:00
did you resolve the problem?
ionthegeek
2K Posts
0
October 23rd, 2015 13:00
I'm not sure why you're commenting on a four year old thread but if you're experiencing an issue with a stripe offline, you should open up a service request.