Start a Conversation

Unsolved

This post is more than 5 years old

X

60279

May 18th, 2013 12:00

Strange issue with Perc 6i

I recently acquired a used PowerEdge 2950 with 3 x 500 GB hard drives.

It worked awesome for a day, then after that the server kept crashing. 

After investigating several days, I find out that the Perc6i raid controller does not seem to like RAID1 or RAID5 configurations.

What I mean by this is every time I attempt to put data on it, Windows Server 2012 (happens with other OS's aswell) stalls and blue screens with "CRITICAL_PROCESS_KILLED" message. 

In addition, when I attempt to run a "Check Consistency" in the Perc6i, the system stalls and does not continue the process. It will also stall on restart after the Firmware Initialization.

Also, Whenever I attempt to install on to a RAID1 or RAID5 configuration, it will boot up to the Windows Server setup just fine. When I click the install location on the drives that are either RAID1 or RAID5, Almost immediatly afterwards, it gives me an error message similar to "Installation cannot continue because installation files are missing or corrupt"  with Error Code 0x8007002"or something similar to that

After I click the "OK" message, it goes to a black screen and waits about 30 seconds, the server will then restart with a Critical Error message. 

Alternatively, it can happen where it will return to the start of the Setup and when I continue back to the screen where you select where to install the drives are then gone.

The Perc6i firmware is up to date as well as the hard drive firmware.

Is it a bad card or something more? 

EDIT: I Should add that I can install successfully on a RAID-0 with 1 drive.

Moderator

 • 

6.2K Posts

May 18th, 2013 13:00

Hello xbamaris

It is going to be difficult to know what the problem is without reviewing a controller log. I would suggest using Open Manage Server Administrator to extract a log from the controller. Upload that log to your forum account or somewhere else on the web and provide a link. I'll take a look at the log and see what is going on.

If you have Windows installed then you can download OMSA and install it. If you don't have an OS installed then you can follow these instructions for booting to OMSA live and running the export: http://de.community.dell.com/techcenter/support-services/w/wiki/369.perc-raid-controller-log-exportieren-mit-dell-omsa-livecd-6-4-englisch.aspx

Here is the Windows installer version if you still have Windows installed: http://www.dell.com/support/drivers/us/en/19/DriverDetails/Product/poweredge-2950?driverId=G2WT6&osCode=WNET&fileId=2883471492&languageCode=en&categoryId=SM

Thanks

7 Technologist

 • 

16.3K Posts

May 18th, 2013 13:00

What make/model are the hard drives you are using?

4 Posts

May 18th, 2013 14:00

Here is what I think is the log that was exported:

I tried uploading it to my profile but it would show up. This is on our external server.

69.195.136.86/lsi_0518.log

Hope that helps.

Hard Drive Models: ST3500320NS

If the log does not contain what you're looking for I could attempt to recreate the problem and export it after that

Moderator

 • 

6.2K Posts

May 18th, 2013 14:00

It looks like the drive in slot 1 is the problem. I show that you have drives in slots 0, 1, and 3. I have found multiple locations in the log where a parity error occurred on a virtual disk, the controller firmware then errored, and a drive disappeared from the list. The drive was redetected when the controller restarted. The drive that is consistently causing these errors and disappearing from the drive list is the drive in slot 1.

I would suggest trying to create a RAID 1 with drives 0 and 3. Try creating a RAID 0 with the drive in slot 1. You can also run diagnostics on the drive in slot 1. From what I see in the logs the drive in slot 1 is bad and causing the controller firmware to crash and restart. Let me know how it goes without that drive.

Thanks

Moderator

 • 

6.2K Posts

May 18th, 2013 15:00

Before you pull the logs again try removing that drive from the server completely. Remove the drive in slot 1 and create a RAID 1 with the other two drives. If it still fails then post another controller log and I'll see what it is doing.

Thanks

4 Posts

May 18th, 2013 15:00

I created the array an attempted to install, about 5% in the installation failed with the similar message as stated before.

After the error popped up both the drive activity lights remained a solid green, system restarted after 30 seconds.

I am burning the disc for the live CD so I can get the controller logs.

4 Posts

May 18th, 2013 16:00

Sounds good, Additionally theres a few things that I think might be important aswell

After the Server restarts after the failed Installation, Two error codes occur:

E1715 I/O Fatal Error

E1422 CPU Mach Chk

Those errors only occur after the failed installation attempt

Additionally, when I said it worked fine for a day, After I started having problems,  I was able to extract all the data that was on it (90 GB worth) when it was in a RAID-5 Array.

EDIT: NEw Log File: http://69.195.136.86/lsi_0518new.log

Additionally, after doing a Fast Inizalization on the drive that I had working, the problem still exists sometimes on the install.

After I receive the error, it seems the entire firmware just crashes. 

I should add occasionally after a restart when the firmware fails I get a "Memory/Battery" problem about the cache being lost. 

I just highly doubt that all the hard drives would have failed near simultaneously without any warning / errors during startup.

Tomorrow I'll run hard drive diagnostics. I should be able to do it from that LiveCD you linked right?

An interesting note though:

When I ran from the LiveCD, and tinkered with the controller settings from the Server Administrator web app, I was able to create an array, then I made it ran "Check Consistency" immediatly when I did that the controller appeared to have failed, but the LiveCD was still working. However, I could not re-access the Server Administrator part because it kept timing out after that occured, and removing a disk and adding a disk would not cause them to spin up.

Not sure if that helps at all, I'm just trying to provide as much information as I can. 

The important question I have is. Do you think that it is most likely the RAID controller based on the behavior?

Moderator

 • 

6.2K Posts

May 30th, 2013 18:00

Sorry about the late response. I don't get updates on post edits.

The important question I have is. Do you think that it is most likely the RAID controller based on the behavior?

Yes, if I had to make a guess based on the information I would suspect the controller. After removing PD 1 the firmware crashed again with the same error. When it came back up the DDF was out of sync on PD 3 so it marked it as foreign. It is possible that one of the remaining HDDs, backplane, or cables are causing issues, but the controller would probably be the most likely cause.

Thanks

No Events found!

Top