Start a Conversation

This post is more than 5 years old

Solved!

Go to Solution

2713

November 27th, 2016 17:00

Unusual problem with PE2950III with 2x Xeon 5450, 64GB RAM, PERC 6i

I have a PE 2950III located at a datacenter that has a peculiar problem.  Know it is running in an "unsupported" configuration, so all bets are off in regards to "typical" issues.  It is running Windows Server 2013 Datacenter with Hyper-V and a few child VMs.  The only role on the host is the Hyper-V role and the VMs are all Windows Server 2016.  There are Storage Spaces set up on the server, and all 6 disks are configured RAID 0, which should be the same as presenting all disks as individual disks.  The Storage Spaces are working correctly.  The odd thing is that the server appears to spontaneously reboot, and *only on the weekends.*  When I use the DRAC to access the server, the PERC6i is showing a virtual disk is missing.  This is not really a problem since the redundancy of the disks are taken care of by Storage Spaces.  This is a storage repository, so performance is not at the top of the list for the server.  When they server restarts, it states that PERC 6i is missing Virtual Disk 0, which is actually the OS/Boot Disk so the server will not boot.  It does not show any Foreign Disks installed either.

After the server is power cycled, the PERC 6i says that a foreign disk has been detected, and to press "F" to add it to the configuration.  Ones "F" is pressed, it boots normally, the VMs start, and all is well.  Does anyone happen to know what could cause the server to fail only on Saturday or Sunday, when it has the least load on it.  It is hung at the PERC BIOS screen.  Until the server is power cycled, it does not show any foreign disks present.

The unsupported part - all of the disks are not Dell certified, and three of the disks are 3TB disks, which the PERC 6i reads as 2TB.  It appears to work fine in this configuration and does not complain about non-Dell disks used in the server.  I would like to know what event can occur to cause a crash and reboot the server,  The PERC driver for the 6i us reporting a predictive failure for disk 0:0:0, which I am replacing.

I now I am venturing into uncharted territory and unsupported territory.  I am only hoping this will limp along until January 1 2017 The server appears to be shutting down gracefully based from the event log, as all services are attempting to stop before the shut down occurs, however after it shuts down, the RAID Controller stops the re-startup processes not recognizing Disk 0:0:0.  The event log does not that unsupported disks are installed, but not during boot.  I know that it is pertinent that I get off of this unsupported configuration ASAP, and that could not be more understood.  This server luckily is for backups of backups and not for LOB or important servers.

All firmware is up-to-date.  Would it be possible that the 3TB disks are causing an issues (thought I personally don't see how they could) or is this a disk patrolling operation that is happening over the weekend the is crashing?

If anything could toss any ideas into the ring I would certainly appreciate it.  Thanks!

5 Practitioner

 • 

274.2K Posts

November 28th, 2016 10:00

Hello.

The Storage Spaces are working correctly.  The odd thing is that the server appears to spontaneously reboot, and *only on the weekends.*

Do you have any sever or applications jobs scheduled for weekend?

The unsupported part - all of the disks are not Dell certified, and three of the disks are 3TB disks, which the PERC 6i reads as 2TB.  It appears to work fine in this configuration and does not complain about non-Dell disks used in the server.  I would like to know what event can occur to cause a crash and reboot the server,  The PERC driver for the 6i us reporting a predictive failure for disk 0:0:0, which I am replacing.

PERC 6i supports up to 2 TB of HD capacity no matter what HD capacity you use. Using HDs not certified by drives can lead to unpredictable issues as you have observed. I would suggest that you use Dell certified drives.

5 Practitioner

 • 

274.2K Posts

November 28th, 2016 11:00

No upper limit is provided. However, it is dependent on the controller firmware. Advisable to keep the firmware up to date when using high capacity hard drives.

5 Practitioner

 • 

274.2K Posts

November 28th, 2016 11:00

The boot process will usually proceed as normal but you will have unpredictable issues with the RAID array such as undetected drives, HDs randomly going offline and overall false status of the RAID. I suggest you use Dell certified drives for smooth operations.

14 Posts

November 28th, 2016 10:00

I certainly will do.  Do you happen to know the upper limit on the H710 single spindle capacity-wise?

Thank you

14 Posts

November 28th, 2016 10:00

Thank you sir for your response.  I had the same thing in mind, and the PERC 6/i is going to be replaced by some of the H710s I have on hand to use until January when on a new budget.

I am not sure if a disk patrol is happening or what the problem is, but I find it odd that it occurs on the weekends only.  That is what befuddles me.  I agree it is completely unsupported and I know it is uncharted territory, and I am fortunate it is working at all.  The Perc 6/i driver is reporting the disks as unsupported but it just logs it and goes on about its business.  Windows Update is disabled  for now from even downloading updates, so it is not requesting a host reboot based on that setting, but according to the Event Viewer, it is actually rebooting gracefully.  I am not getting the typical "An unexpected Windows shutdown occurred" when the OS does boot, and it is always the same disk that is not found after the reboot, which is, ironically, a Dell certified 1TB SATA disk in slot 0.  After the reboot, the PERC 6/i does not see this disk, which is configured as the boot disk, so it hangs on the BIOS/POST/Pre-boot screen.  If the server is powered off and cold booted, I allow it to do the complete memory test and no errors are found, and when the PERC 6/i BIOS boots, it states that the configuration has changed and asks to press C to continue or F to import the foreign disk.  Before the power cycle, the "F" option does not appear, nor do any foreign disks appear in the RAID BIOS menus, nor the option to import them.

I appreciate your help knowing this is unsupported, and you didn't have to even assist, but you did anyway and I thank you.  If I can determine what it is rebooting (it doesn't seem to be a dirty shutdown) it may work until January when it can be replaced with a new server.  Meantime, I am going to replace the PERC 6/i with an H710 and Mini-SAS to SAS cables to the backplane.  It will be interesting if the 710 sees the 3GB disks as what the 6/i labeled them as (2TB) or if it will see them as 3TB.  If it does see them as 3TB, I am curious if it will allow me to extend the RAID0 array of the single disk to 3TB and it appear in Disk Management as a 3TB.  For anyone following this post, I will post my results.

Thank you again.

***EDIT - I missed your initial question.  No, there are no jobs that are scheduled for the weekend, or any interactive sessions, and this server has virtually no load on it.  I do think I will download the Event Logs and see if I can find a pattern.

5 Practitioner

 • 

274.2K Posts

November 28th, 2016 10:00

Meantime, I am going to replace the PERC 6/i with an H710 and Mini-SAS to SAS cables to the backplane.  It will be interesting if the 710 sees the 3GB disks as what the 6/i labeled them as (2TB) or if it will see them as 3TB.  If it does see them as 3TB, I am curious if it will allow me to extend the RAID0 array of the single disk to 3TB and it appear in Disk Management as a 3TB.  For anyone following this post, I will post my results.

Yes, the PERC H710 supports HDs of greater 2 TB capacity and should be able to see all 3 TB.

Review the logs and let know of any unusual pattern

14 Posts

November 28th, 2016 11:00

I did notice "Optimized drives" was scheduled weekly, so I suppose that could be crashing it.

Do you know what the largest Dell Certified SATA drive is for archival purposes (and not performance?)

And one last question and as I know you are busy.  Can the PERC 6/i be placed in HBA mode, disabling RAID functionality?  I think this was a function of the older cards, but I don't recall seeing it on the 6/i.

Thank you.

14 Posts

November 28th, 2016 11:00

Will the H710 reject/hang the boot process with non-Dell drives?

14 Posts

November 28th, 2016 11:00

Although the Windows GUI did not present the Unexpected Shutdown dialog (I anticipate this because the Shutdown Event Tracker was disabled on this server), the event log did store this:

The previous system shutdown at 6:46:48 AM on ‎11/‎26/‎2016 was unexpected.     EventLog 6008

The log immediately before this was: The Volume Shadow Copy service entered the stopped state. Event Source Name Service Control Manager    Event Source: Service Control Manager     EventID     7036  

There are several of these when the disk boots, probably as a result in 3TB drives on a 2TB controller.

I am seeing several of these Event 2335 from Server Administrator in the Windows Event Log:

Controller event log: PD 111(e0x00/s17) Path 1221000000000000  reset (Type 03):  Controller 0 (PERC 6/i Integrated) 

There are a few of these appearing after the server boots up.  Probably because of the same reason above.

So I was clearly wrong about it being a graceful shutdown as EventID 6008 shows.

I have enabled additional logging to show why this may be happening.  I don't see a particular error right before the shutdown occurs in the standard event logs or the Microsoft event logs.  I am starting to think it is one or a combination of 1) A predicted failure of the Dell Certified 1TB SATA Disk in 0:0:0 or the PERC is accessing a disk that it does not have the logic to access properly, causing a shutdown.  I could disable automatically reboot on STOP to see if that shows anything.  I believe I need to 1) Replace disk 1 that has a predicted failure, and 2) install the H710 and new battery.

You are probably not in a position to comment, but if you can, could you suggest a disk that may be most compatible with a Dell Certified Disk?  Obviously an Enterprise disk and not a Desktop drive.

Again, Thank you for your help.

5 Practitioner

 • 

274.2K Posts

November 28th, 2016 11:00

I have enabled additional logging to show why this may be happening.  I don't see a particular error right before the shutdown occurs in the standard event logs or the Microsoft event logs.  I am starting to think it is one or a combination of 1) A predicted failure of the Dell Certified 1TB SATA Disk in 0:0:0 or the PERC is accessing a disk that it does not have the logic to access properly, causing a shutdown.  I could disable automatically reboot on STOP to see if that shows anything.  I believe I need to 1) Replace disk 1 that has a predicted failure, and 2) install the H710 and new battery.

Note that whereas the PERC H700/H710 controllers support hard drives of capacity greater than 2 TB, they are not supported controllers on the 2950 server. The desire to use 3 TB or more capacity hard drives will be accompanied by server upgrade to 11th Gen for PERC H700 or higher servers.

14 Posts

November 28th, 2016 11:00

So far it has seemed to work without issues.  Only the replacement cables need to be used to connect to the backplane.  No unexpected errors like I am seeing here has happened, and so far it is running nicely.  I know the 2950III is well beyond it's supported lifespan, so I don't mind a little experiementing for lab environments, but in no circumstance would I place a server like this in a production environment..

Thank you

14 Posts

November 28th, 2016 12:00

"Note that whereas the PERC H700/H710 controllers support hard drives of capacity greater than 2 TB, they are not supported controllers on the 2950 server. The desire to use 3 TB or more capacity hard drives will be accompanied by server upgrade to 11th Gen for PERC H700 or higher servers."

Are you referring to UEFI in this statement, or that the 2950III will just have a hard time understanding the disk size presented to it from the controller?

Thank you.

5 Practitioner

 • 

274.2K Posts

November 28th, 2016 13:00

The PERC H700/H710 are not simply supported on the 2950 server regardless of the boot mode whether traditional BIOS or UEFI

14 Posts

November 28th, 2016 13:00

Gotcha.  Thank you again!

9 Legend

 • 

16.3K Posts

November 28th, 2016 19:00

The PERC H700/H710 are not simply supported on the 2950 server regardless of the boot mode whether traditional BIOS or UEFI

Note that the 2950 has NO UEFI support.

No Events found!

Top