Unsolved
This post is more than 5 years old
11 Posts
0
1607
R510 Fatal IO error !!
Hi,
We've recently deployed Dell R510 server which is randomly crashing and not showing much evidence of the crash in iDrac. Here is the servers specs :
2 x L5640
RAM 64GB
12x3TB Sata
HBA LSI-9211 Controller
OS : FreeBSD-10.2
The crash event generated in iDrac only tells 'A Fatal IO error' and nothing else. Here is the screenshot of logged event :
http://prnt.sc/ats4ny
DELL-Josh Cr
Moderator
Moderator
•
8.4K Posts
0
April 18th, 2016 14:00
Hi,
Was the server doing this prior to this deployment? It looks like the error is coming from CPU2. You may want to try to swap the processors and see if the error follows the CPU.
shahzaibcb
11 Posts
0
April 19th, 2016 02:00
Thanks for the response. Well, let me provide you a bit history of deployment. Before Dell R510, we had deployed supermicro and those were also crashing intermittently with error "Internal Timer error" on top of FreeBSD OS, we followed a big list of debugging including switching FreeBSD with Debian but crashing was quite frequent. So we concluded that, supermicro might be the problem maker and should be replaced with some Branded hardware and we went with Dell R510. Now its been 10 days of Dell deployment and we encountered the crash issue with it as well. Now we do not know how to fix this thing. We've put lot of budget in investment but this crash issue has been there since 6 months. OS also had showed that 'Internal Timer Error' on crash dump. For now, we're swapping CPUs as per your suggestion but we really now have no idea where to go from here.
DELL-Josh Cr
Moderator
Moderator
•
8.4K Posts
0
April 19th, 2016 09:00
What version of Debian are you using? We have not validated Debian, it is possible there is something with it that is causing the issues.
shahzaibcb
11 Posts
0
May 3rd, 2016 05:00
Thanks for response. Well, it is debian jessie 8 version . FreeBSD version is 10.2. We had noted one more thing, the 2 servers with logical cores disabled were not crashed while the rest were crashing. So we did disabled logical cores on rest of the servers and till now they're not crashed. We'll monitor it for some more days and will update here.