I've been into Dellmgr and disabled the alarm. The system ID is B4JT931. The server has had all the recent Novell patches installed as well as firmware updates from Dell. The server had been abending and/or hanging until the installation of NW6.5 SP3 and related patches about 30 days ago, so there is a lengthy abend.log.
I'm not sure about array manager. I only recently began working with this server, and am unfamiliar with dell specific tools, as I primarily have used HP in the past. I do see a dellmon.nlm. What is array mgr, and can I download it if it's not installed?
If you loaded the Dell PEDGE3.HAM, it should have copied DELLMGR.NLM into SYS:SYSTEM. When you run this module from the console, it looks like the CTRL+M PERC BIOS. You can disable the alarm there.
As for the mega_HAM_Timeout error, this is going to take some troubleshooting. It is possible that it is a PERC failure, but I think at this point it is unlikely.
I would start by making sure all software has the latest patches installed. Check to see you an ABEND.LOG exists in SYS:SYSTEM.
Also, you can pull a log from the PERC card itself. It can pulled using Array Manager if you have it installed and configured or Dell technical support can provide you with a tool called TTY that will allow you to pull the PERC log from DOS.
Please post your service tag number and let me know if your server is abending and if you have Array Manager installed.
Looks like your server has a PERC4dc. Check to see what firmware you have installed. You should be able to find it in DELLMGR in Object - Adapter - Other Adapter information.
Also, all the third party software needs to be patched, which includes your backup software.
As for Open Manage, take a look in the autoexec.ncf towards the bottom and see if you see any comments for Dell software. If you do, copy the lines here and I'll tell you what you have installed. I would also like to know if you have Symantec Antivirus installed.
Carrie
Message Edited by DELL-Carrie on 08-29-2005 09:14 AM
I have not received an e-mail yet, but I have tried to use the nwttylog.nlm to extract the PERC's log. I was able to see entries in the log that started from my last reboot on 08/08, but unfortunately, this tool did not extract any of the recent log entries. For whatever reason, the log seems to be fine until it gets to PR log entries on 08/16, prints a "€" character, and then the log duplicates again from the beginning multiple times.
The tty log looks fine. I hadn't seen the Novell TID you posted.
Please call into Dell technical support and ask them to replace the raid card. Give the Dell tech a link to this forum post. Please let me know if replacing the PERC resolves the error.
I have. After 3 hrs of telephone time, they tell me this error is a software error, not a hardware problem. After much wrangling, they agreed to send me a card, which I just received. I'll let you know the outcome.
Any idea why the log didn't have any of the more recent entries in it?
Thanks again for your assistance. I really do appreciate it.
If you look a little further down in this forum you will see that I posted the identical error message. I wrangled with Dell for a couple of days, lost a ton of data, and then they replaced the card. It's amazing how the error went away after the card was replaced, note the software is the same, but Dell always blames the software.
I've had it on 2 servers, myself, and I posted it here before. The tech I got (lucky me) knew what he was doing, and from those logs, was able to determine "without a doubt", according to him, that this error indicates bad ram on the raid controller. its not the controller itself. But both times, they just replaced the controller.
Must have had a bad production run, as these 2 servers were identical, and purchased at the same time. both had the problem.
Hi Gman5, has your issue been resolved after replacement of the PERC controller? Cause currently I am having a customer tha facing the same issue like yours.
I recommend that you replace the controllers immediately. The problem is bad cache on the controller. Looks like Dell might have had a bad production run. When you get your new controllers double-check that the firmware is the latest version (the ones shipped to me were a few revisions behind). If you continue running your servers with these HAM errors you are likely to experience loss of, or corruption of, data.
Replacing the Perc controller did resolve my issues, although I had to argue with support for hours to convince them to send me a replacement. The only reason I found the error was 1 drive in my array went offline, and in the course of working on it, I found this error, as well as another post here with the same issue. Novell also has a TID on this error.
Luckily, Dell did send me a replacement controller, and just a few days later, I did have a drive failure that could not be corrected without replacement of the drive.
Gman05
7 Posts
0
August 29th, 2005 12:00
Thanks,
I've been into Dellmgr and disabled the alarm. The system ID is B4JT931. The server has had all the recent Novell patches installed as well as firmware updates from Dell. The server had been abending and/or hanging until the installation of NW6.5 SP3 and related patches about 30 days ago, so there is a lengthy abend.log.
I'm not sure about array manager. I only recently began working with this server, and am unfamiliar with dell specific tools, as I primarily have used HP in the past. I do see a dellmon.nlm. What is array mgr, and can I download it if it's not installed?
Thanks
Carrie_1
2 Intern
•
188 Posts
0
August 29th, 2005 12:00
If you loaded the Dell PEDGE3.HAM, it should have copied DELLMGR.NLM into SYS:SYSTEM. When you run this module from the console, it looks like the CTRL+M PERC BIOS. You can disable the alarm there.
As for the mega_HAM_Timeout error, this is going to take some troubleshooting. It is possible that it is a PERC failure, but I think at this point it is unlikely.
I would start by making sure all software has the latest patches installed. Check to see you an ABEND.LOG exists in SYS:SYSTEM.
Also, you can pull a log from the PERC card itself. It can pulled using Array Manager if you have it installed and configured or Dell technical support can provide you with a tool called TTY that will allow you to pull the PERC log from DOS.
Please post your service tag number and let me know if your server is abending and if you have Array Manager installed.
Carrie
Carrie_1
2 Intern
•
188 Posts
0
August 29th, 2005 12:00
Looks like your server has a PERC4dc. Check to see what firmware you have installed. You should be able to find it in DELLMGR in Object - Adapter - Other Adapter information.
Also, all the third party software needs to be patched, which includes your backup software.
As for Open Manage, take a look in the autoexec.ncf towards the bottom and see if you see any comments for Dell software. If you do, copy the lines here and I'll tell you what you have installed. I would also like to know if you have Symantec Antivirus installed.
Carrie
Message Edited by DELL-Carrie on 08-29-2005 09:14 AM
Gman05
7 Posts
0
August 29th, 2005 13:00
Carrie_1
2 Intern
•
188 Posts
0
August 29th, 2005 17:00
I am about to email you a file to pull the PERC log. You will need to down the server to DOS.
Carrie
Gman05
7 Posts
0
August 30th, 2005 13:00
€
Carrie_1
2 Intern
•
188 Posts
0
August 30th, 2005 14:00
The tty log looks fine. I hadn't seen the Novell TID you posted.
Please call into Dell technical support and ask them to replace the raid card. Give the Dell tech a link to this forum post. Please let me know if replacing the PERC resolves the error.
Carrie
Carrie_1
2 Intern
•
188 Posts
0
August 30th, 2005 15:00
You should get the replacement PERC card today. Let me know if this fixes the error.
Carrie
Gman05
7 Posts
0
August 30th, 2005 15:00
I have. After 3 hrs of telephone time, they tell me this error is a software error, not a hardware problem. After much wrangling, they agreed to send me a card, which I just received. I'll let you know the outcome.
Any idea why the log didn't have any of the more recent entries in it?
Thanks again for your assistance. I really do appreciate it.
Kayak64
12 Posts
0
September 21st, 2005 11:00
cac4
10 Posts
0
September 21st, 2005 20:00
I've had it on 2 servers, myself, and I posted it here before. The tech I got (lucky me) knew what he was doing, and from those logs, was able to determine "without a doubt", according to him, that this error indicates bad ram on the raid controller. its not the controller itself. But both times, they just replaced the controller.
Must have had a bad production run, as these 2 servers were identical, and purchased at the same time. both had the problem.
Steven Goh
1 Message
0
September 22nd, 2005 07:00
Hi Gman5, has your issue been resolved after replacement of the PERC controller? Cause currently I am having a customer tha facing the same issue like yours.
Kayak64
12 Posts
0
September 22nd, 2005 08:00
Gman05
7 Posts
0
September 22nd, 2005 12:00