5 Posts

March 13th, 2008 18:00

Good Question about CPU utilization. Just went through some history and saw that utilization was never below 97% idle from March 3 - March 13.

5 Posts

March 13th, 2008 18:00

I agree. That's why I'm so frustrated. This is not a reasonable solution.
"At This time there is no Cpu error.the fix for the error reported is to clear the log and power cycle the system"
That is a response to an email exchange.

Here is the rest of the exchange. When I placed the service call, I was asked to download the Dell server software for RHEL 4 so I can create a DSET_Report and send that in. I was asked to run the tool on the server, clear the hardware logs, power off the machine, unplug the power cords, press the power button to "clear stored power", plug everything back in and email back when those tasks were completed.

I held off on doing this because I can't just reboot a server on a whim. So here is the exchange.

STRING from after inital phone call:
Dell:
Download Dset for linux.

Me:
done. Can not reboot server but I have generated dset report.

Dell:
Checking for reboot status

Me:
Still have Not rebooted.

Dell:
It is important that you reboot the server after clearing the logs to ascertain that the failure is gone.
Otherwise you run the risk of a processor failure
Please let me know how you want to handle that

Me:

Will clearing the logs and powering off the machine solve the fact that there actually might be a hardware problem?
I am hesitant to reboot because I will have to poweroff the machine another time when the CPU will have to be replaced?
Are you able to not see the error code in the zip file that I emailed?


Dell:
At This time there is no Cpu error.the fix for the error reported is to clear the log and power cycle the system.


Email exchange timeline
Phone call was placed March 6th
Email exchange from March 10th to March 13.

Amber LCD is gone and is now back to Blue.
Nothing has been fixed in my mind.
This is what Dell describes the E1420 as

E1420 CPU Bus PERR The system BIOS has reported a processor bus parity error. See Getting Help.

March 17th, 2008 13:00

Guys, thanks so much for posting this information. I am glad I am not the only one having this issue as well. Our server crashed early Saturday afternoon and was not noticed until early evening (since we have our BES virtualized on this server via Microsoft Virtual Server 2005 SP1 x64). Pretty much the same issue as what everyone has described in this forum (server locks up, orange LED on front panel, CPU Bus PERR on front panel LED).

 

Here are the specifications of the server:

 

Dell PowerEdge 2950 III

Dual Intel Xeon L5335 @ 2.00GHz (Model 15, Stepping 11 , Revision G0)

Microsoft Windows Server 2003 R2 Standard x64 Edition SP2

Microsoft Virtual Server 2005 R2 SP1 Enterprise Edition (1.1.603.0 EE)

 

 

This is the first occurance of the issue and this server was newly built as of January. I will be logging a support call with Dell as well. Maybe the more Dell is made aware of how many servers are affected by this, the faster we may get to a resolution.

 

Please keep us up to date on anything you may hear from Dell regarding this. Thanks.

 

14 Posts

March 19th, 2008 15:00

I've heard back from Dell and they recommend enabling the CPU "Virtualization Technology" setting in the BIOS.  It is located in "CPU Information".  I found this forum post below when doing a google search.  Credits go to a Mr. Ned Slider for this response.

 

You really want to enable virtualization in your bios.

There are two ways you can do virtualization - in software or in hardware. Before recent CPUs that support hardware virtualization, this was always done in software with programs like VMWare or Virtual PC. These software solutions "virtualize" a second processor to run a second OS, and obviously there is some overhead involved in this (hence why a virtual OS never quite performs as fast as a native OS).

However, with the introduction of hardware virtualization, a single CPU can act if it were several CPUs running in parallel, allowing the system to run several operating systems at the same time. In theory hardware virtualization should be more efficient than software virtualization.

Software such as Xen, VMWare and Virtual PC support hardware virtualization. Xen is a little different than programs such as VMWare and Virtual PC that run on top of a conventional OS in that it uses a thin software layer known as a hypervisor (basically a virtualization-enabled kernel) that is inserted between the server’s hardware and the virtualized operating system(s), so there's not really an underlying OS as such like Windows. Xen is very popular with hosting companies that want to host multiple virtual servers on a single hardware server.

Also, I believe if you don't enable hardware virtualization in your bios, on a C2D system you would only see a single core processor on your virtual machine as the virtualization would be performed in software.

1 Message

March 20th, 2008 07:00

We're seeing the same behaviour here (intermittent crashes with CPU BUS PERR) on a PowerEdge 2900, running Virtual Server 2005 and Windows Server 2003 Enterprise x64 SP2.

 

The CPUs are E5310s @1.6GHz, with 8GB of RAM, and we've had the virtualization option in the BIOS turned on for a couple of months and still get the problem.

 

At this point we've already had dell swap the machine out - entirely, even disks - and had the same behaviour on the new machine.

 

Has anyone seen this error when using VMware server instead of Microsoft Virtual Server?

23 Posts

March 21st, 2008 13:00

Guys, gals.

 

Just got off the phone with a team lead of my level 3 support.

 

Basically here's the situation. They have a number of servers that they are testing with that are having this error. There's ALOT of customers experiencing this.

 

It's  NOT virtual server. As we thought at first. They have guys with other OS's, such as on  here with Redhat, experiencing this problem.

 

So, I'll keep you guys up to date on what's going on, but i'm pretty satisfied after I got a few calls from dell today with the situation, that they w ill get it resolved.. no time frame, but they will.

 

 

9 Posts

March 21st, 2008 14:00

Thanks for the update shankshank!

23 Posts

March 24th, 2008 11:00

my server went down again on Saturday morning.

 

 

12 Posts

March 24th, 2008 12:00

shankshank,

 

How many times has your server gone down so far? How much time passes between crashes?

 

Mine has only crashed once and that was about 3 weeks ago.  I'm super nervous it will happen again

 

Thanks

23 Posts

March 24th, 2008 12:00

my first server crashed 2\20\2008 after like 2 weeks of being in production.

replaced the server and it crashed a few days later on 3\22\2008

 

the same server crashed on sat the 22nd.

 

 

946 Posts

March 31st, 2008 19:00

Z_Z,

I am waiting to hear back from engineering what is causing this error in general and from an OS stand point. If you will PM me your service tag, I will check the case notes and with with the tech and see if he has heard anything as well. As soon as I hear something I will post back here.

946 Posts

March 31st, 2008 19:00

For users running RedHat, the updates outlined here may help to resolve this issue. Still awaiting a resolution on the MS side.

12 Posts

March 31st, 2008 19:00


@DELL-Dennis S wrote:
For users running RedHat, the updates outlined here may help to resolve this issue. Still awaiting a resolution on the MS side.

Dennis,

 

Can you please provide some insight into the problem (I started this thread a few weeks ago, still waiting to hear back from my Dell Tech on a resolution).

-What conditions cause the server to throw that error?

-What are you guys currently thinking causes this error on the MS side? 

-etc

 

Thanks!

 

23 Posts

April 7th, 2008 17:00

Guys,


For those experiencing this issue with RedHat, my dell tech just called me and said a fix has been released in their updated kernel.

 

As far as windows, they are still heavily stress testing this to recreate the issue.

 

Will keep you guys posted.

6 Posts

April 24th, 2008 19:00

Interesting, any idea if the 54xx series (also quad-core) is also affected?  Heck, I'd drop in a couple 51xx series dual core CPUs if it would prevent this from happening again.  I'd rather not wait for MS on this one.
No Events found!

Top