Does your infinity fabric run at speeds higher than 1700 Mhz?
If I am not mistaken the closest infinity fabric clock you can set to 1700 Mhz is 1733 Mhz, which results in your memory running at 3466 Mhz.
If you use 3400 Mhz memory, it will likely cause issues. I don't think there is ram available that supports 3466 Mhz. It typically goes in increments of 200 Mhz, so 3400 or 3600 Mhz. I think what they put in the R10 is 3400 Mhz ram, as far as I am aware the highest frequency Dell supports on the R10 board.
You would either need to install 3600 Mhz memory, which I do believe is not supported on these systems, or lower the infinity fabric clock to 1600 Mhz to run it at 3200 Mhz.
You can check your fabric speed with CPU-Z, a free utility. It will be under the memory tab. If you multiply the fabric frequency by 2 you should get the required memory frequency:
2 x 1600 Mhz = 3200 Mhz
2 x 1733 Mhz = 3466 Mhz
It is normal to see slight variants on that, for example 1596.8 Mhz instead of exactly 1600 Mhz.
I think someone else had a similar issue with their config. Sounds to me like it is one of those cases where Dell did not read the hardware manual before they released a product. Otherwise they would realize the issue with the Ram frequency.
The fabric clock in CPU-Z and Ryzen Master (just downloaded) says 1,733 MHz. The memory clock is also the same value. I cannot edit either of these in Ryzen Master - the CPU is running in Precision Boot Overdrive OC mode. I don't know if that has something to do with not being able to edit that value.
The Alienware Command Center has only two options for RAM - overclock "enabled" which runs the RAM at 3466 MHz or not which runs it at 2666MHz. The build sheet for the system says 3400 MHz RAM (which is a higher priced option).
When memory OC is disabled, the frequency drops to 2666 MHz (which is nowhere near what I paid more money for), but all the event viewer errors/warnings stop appearing. The fabric clock at this point is 1,333 Mhz.
If what you're suggesting is true around the clock discrepancies, Dell sold me a $4400 machine that was never going to work correctly out of the box. I'm guessing if they swap out the RAM for 3200 MHz modules, this will go away because the infinity fabric will run at 1,600 MHz?
So I was able to get Ryzen Master to reduce the Infinity Fabric clock speed, and this is where it gets interesting.
When Infinity Fabric is set to 1700 MHz (down from Alienware's default of 1733 MHz), it should theoretically work with the 3,400 MHz RAM right? But no, the warnings keep flooding the Event logs.
However, when I take Infinity Fabric down to 1,600 MHz, the 3,400 MHz RAM operates at 3,200 MHz and no more issues crop up. I don't know if this resolves the random reboot issue but will monitor.
At this point I can only imagine that either the processor, RAM or both is/are not capable of what Alienware sold them out as being specced to and capable of. Thoughts?
I don't have a 5000 series Ryzen, so I cannot test this out properly. But my understanding is that with the 5000 series AMD increased the stability of running the fabric speed higher than 1600 Mhz, up to 2000 Mhz.
However, there's no guarantee a Ryzen 5000 will run past 1600 Mhz fabric speed. It depends on the quality of the processor and MB components, and there's a variance between processors and motherboards that makes it a bit of an error and trial situation right now to figure out the highest fabric speed it can run.
Now the reason why they want to run past 1600 Mhz fabric speed is because they want to support higher ram frequencies at 1:1 ratio.
1600 = 3200 Mhz
1700 = 3400 Mhz
1800 = 3600 Mhz
1900 = 3800 Mhz
2000 = 4000 Mhz
It seems based on what I have read about this, that there's no such speed as 1700 Mhz. It's actually 1733 Mhz.
1733 = 3466 Mhz
So if you use 3400 Mhz ram, it will actually run at 3466 Mhz. This is above the rated speed, and might or might not work.
Going by your experiences, it looks to me like you cannot get stability at anything higher than 1600 fabric speed. Now this could be because of the processor, or the motherboard. As the frequencies get higher, so does noise and crosstalk. I think the R10 board is only certified to run at 1600 Mhz fabric speed and I am thinking that is where the issue lies.
What I have also picked up on in various articles, is that the motherboard bios can contain the option to run the ram at a different ratio than 1.1. However, that would have to be supported by the bios and the manufacturer of the board.
With a different ratio you could run the fabric speed at 1733 Mhz, and the memory at a different frequency tan 3466 Mhz. This would increase additional latency, but it would make it stable. It does not look like the R10 bios has that option.
I wish I had an R10 with your config to test this all out.
Ok, so Ryzen Master lets me adjust fabric clock in increments of 33 MHz (33.333... to be exact).
So I'm able to set it to exactly match the RAM clock (and the Alienware Command Center also detects the clock correctly - it cannot set it like Ryzen Master can but it can read it).
Running at 1,700 MHz, the event logs get flooded with warnings at the rate of 4 events per second. It's one every 30 seconds or so at 1,733 MHz (Alienware default, which is strange). And it's 0 failures at 1,600 Mhz.
Dell said they've raised this with engineering (private messages on this forum). I don't have any failures in Dell's health assessment (pre-boot or SupportAssist), so it's possible they are not catching this all at in pre-shipment health checks at the factory too. I've not started a formal support case but let's see if engineering gets back in some form.
I've not tried anything between 1,600 and 1,700 but I will to see if there's anything notable happening there. If there's anything else you want me to try out (without voiding the warranty), I can.
As a further update to this, the highest fabric speed I can run at without issues is 1,633 MHz, which runs the RAM at 3,266 MHz. At 1,666 MHz, the warnings come in at about one every 2 to 3 seconds.
Thanks for your pointers in getting to the root of the issue. I'll still be available to try different things.
If I don't hear back from Dell in some reasonable time, I'm going to get an official support case started as this is not what I paid extra for on top of an already very expensive machine. I don't know if they never noticed this in engineering or if this issue just didn't happen in whatever engineering sample(s) they validated with before heading to production.
Based on your testing it seems the cut off point is 1633 Mhz. That gives you 3266 Mhz on the ram, which is below it's rated 3400 Mhz frequency.
I think we can rule out the ram at this point as the issue.
I would say at this point, I am going to lean towards the motherboard being the issue. The R10 is a new revision, but from other threads on this forum it is the same board.
Since pre Ryzen 5000 the board was certified for a fabric speed of 1600 Mhz, I am thinking that higher fabric speeds are not consistently possible on this board. Normally when you design a board, it becomes increasingly expensive to design it so it can handle higher frequencies.
You could also be extremely unlucky and have a processor that can only reach 1633 Mhz fabric speed. I doubt that would be the case.
Hopefully they get back to you with a solution, because that's not a cheap system.
If that is the case, no bios or software update can fix this issue as it would be an issue with the hardware design of the board.
What would be interesting to see is someone who has an R10 with a Ryzen 3000 processor and ram running at 3400 Mhz. Would be interesting to see their fabric clock. I have a feeling it is 1600 Mhz with a ratio so the memory can run at 3400 Mhz. Hopefully someone with that combination can post a screenshot of CPU-Z memory tab.
- I was mistaken on the 1633 MHz being okay, the machine's only able to take a 1600 MHz fabric clock. Anything higher results in the System event logs being flooded with warnings at varying rates. This may be a limitation of the motherboard as you've hypothesized.
- So the good news, if it can be considered good, is that the RAM can be run separately at 1700 MHz or 1,733 MHz (both without issues) so long as the fabric clock is no higher than 1,600 Mhz. I'm not entirely sure what this means from a performance standpoint.
In any case, I've reverted everything to the Alienware default (1,733 MHz everything) so that Dell/Alienware addresses this properly. Warning are strewn across my event logs. I'll have a support case started next week, and hopefully it'll lead down a path that doesn't involve mindlessly swapping out components.
Hi, you are not the only one having this random reboot. I also having this issue. However, I assume it has something to do with the heat at least for my case. 2 days ago I used a different nvme on the slot and replace the 500gb stock with 1 tb which is cheaper.My CPU is 5950x and GPU 6800XT
Last night, I decided to install it with Windows while planning to install Linux later (lack of drive and 3rd party pcie nvme/ssd not detected).
During the super fast fresh installation which took less than 10 mins to complete there was no reboot. However, around 5 minutes of used while trying to install driver for gpu it reboot itself. After 1 minute in it reboot again. So, I turned it off and leave it be for 30 minutes and tried again. It worked but few minutes after it reboot again. So that reboot is going on for several times either no load or while during stress test with Clock Turner and cinebench. I am using Core Temp to monitor the temperature. Everytime there is a spike of near 85 degree it reboot itself, I guess the system automatic safety.
I removed the pcie card since the mobo couldn't detect it and the 16tb hdd. I decided to let the case open while using it it worked for almost 2 hours and reboot. The second time last longer and around 4 hours usage before another reboot. I am giving up at this point.
Alienware diagnostic test says everything is running good while it was running with the open chassis. So that means there is nothing wrong with any components. It has to be the heat and the liquid cooler doesn't provide adequate cooling.
GPU temp and watt is low while being used. However, CPU heat
When case is closed: 1 - 50 minutes reboot 49-55 degree from cold boot average 65 degree and up after prolong before the 50 minutes used ~80 - 85 degree reboot
When case is opened: ~4 hours usage 37 to 40 degrees from cold boot Reboot after around 4 hours used. I didn't manage to see the last cpu temperature reading.
Bios is updated to 2.1.1 early before the last reboot.
My only solution is to
1. Replace it with a new liquid cooler, dual fan 2. Move GPU and cpu to a new motherboard and chasis. 3. Return it and get a refund but that means I wouldn't able to get a new GPU for I don't know how long. So, I don't feel like to let it go.
I am also in the middle of trying to look for parts to build Ryzen server using server class motherboard ipmi or cheaper epyc cpu/mobo. This is not good to spend that much money and computer is not working out of the box.
After removed the overclock capabilities with house ac on right now
The cpu is staying around high 40 to 56 degree since my last post (I am using my other system to type this). My conclusion is it is definitely the heat that causing the reboot. Of course, removing the overclock also cooling down the cpu. I will let it run whole night and update it here. Anybody know which liquid cooler will work?
Vanadiel
6 Professor
•
7.1K Posts
2
March 11th, 2021 21:00
Does your infinity fabric run at speeds higher than 1700 Mhz?
If I am not mistaken the closest infinity fabric clock you can set to 1700 Mhz is 1733 Mhz, which results in your memory running at 3466 Mhz.
If you use 3400 Mhz memory, it will likely cause issues. I don't think there is ram available that supports 3466 Mhz. It typically goes in increments of 200 Mhz, so 3400 or 3600 Mhz. I think what they put in the R10 is 3400 Mhz ram, as far as I am aware the highest frequency Dell supports on the R10 board.
You would either need to install 3600 Mhz memory, which I do believe is not supported on these systems, or lower the infinity fabric clock to 1600 Mhz to run it at 3200 Mhz.
You can check your fabric speed with CPU-Z, a free utility. It will be under the memory tab. If you multiply the fabric frequency by 2 you should get the required memory frequency:
2 x 1600 Mhz = 3200 Mhz
2 x 1733 Mhz = 3466 Mhz
It is normal to see slight variants on that, for example 1596.8 Mhz instead of exactly 1600 Mhz.
I think someone else had a similar issue with their config. Sounds to me like it is one of those cases where Dell did not read the hardware manual before they released a product. Otherwise they would realize the issue with the Ram frequency.
koshyjohn
18 Posts
2
March 11th, 2021 21:00
The fabric clock in CPU-Z and Ryzen Master (just downloaded) says 1,733 MHz. The memory clock is also the same value. I cannot edit either of these in Ryzen Master - the CPU is running in Precision Boot Overdrive OC mode. I don't know if that has something to do with not being able to edit that value.
The Alienware Command Center has only two options for RAM - overclock "enabled" which runs the RAM at 3466 MHz or not which runs it at 2666MHz. The build sheet for the system says 3400 MHz RAM (which is a higher priced option).
When memory OC is disabled, the frequency drops to 2666 MHz (which is nowhere near what I paid more money for), but all the event viewer errors/warnings stop appearing. The fabric clock at this point is 1,333 Mhz.
If what you're suggesting is true around the clock discrepancies, Dell sold me a $4400 machine that was never going to work correctly out of the box. I'm guessing if they swap out the RAM for 3200 MHz modules, this will go away because the infinity fabric will run at 1,600 MHz?
koshyjohn
18 Posts
1
March 11th, 2021 21:00
Leaving a link for reference to the other thread in this community reporting the same underlying issue:
https://www.dell.com/community/Alienware-Desktops/Aurora-R10-a-corrected-hardware-error-has-occurred/
koshyjohn
18 Posts
1
March 12th, 2021 11:00
So I was able to get Ryzen Master to reduce the Infinity Fabric clock speed, and this is where it gets interesting.
When Infinity Fabric is set to 1700 MHz (down from Alienware's default of 1733 MHz), it should theoretically work with the 3,400 MHz RAM right? But no, the warnings keep flooding the Event logs.
However, when I take Infinity Fabric down to 1,600 MHz, the 3,400 MHz RAM operates at 3,200 MHz and no more issues crop up. I don't know if this resolves the random reboot issue but will monitor.
At this point I can only imagine that either the processor, RAM or both is/are not capable of what Alienware sold them out as being specced to and capable of. Thoughts?
Vanadiel
6 Professor
•
7.1K Posts
2
March 12th, 2021 12:00
I don't have a 5000 series Ryzen, so I cannot test this out properly. But my understanding is that with the 5000 series AMD increased the stability of running the fabric speed higher than 1600 Mhz, up to 2000 Mhz.
However, there's no guarantee a Ryzen 5000 will run past 1600 Mhz fabric speed. It depends on the quality of the processor and MB components, and there's a variance between processors and motherboards that makes it a bit of an error and trial situation right now to figure out the highest fabric speed it can run.
Now the reason why they want to run past 1600 Mhz fabric speed is because they want to support higher ram frequencies at 1:1 ratio.
1600 = 3200 Mhz
1700 = 3400 Mhz
1800 = 3600 Mhz
1900 = 3800 Mhz
2000 = 4000 Mhz
It seems based on what I have read about this, that there's no such speed as 1700 Mhz. It's actually 1733 Mhz.
1733 = 3466 Mhz
So if you use 3400 Mhz ram, it will actually run at 3466 Mhz. This is above the rated speed, and might or might not work.
Going by your experiences, it looks to me like you cannot get stability at anything higher than 1600 fabric speed. Now this could be because of the processor, or the motherboard. As the frequencies get higher, so does noise and crosstalk. I think the R10 board is only certified to run at 1600 Mhz fabric speed and I am thinking that is where the issue lies.
What I have also picked up on in various articles, is that the motherboard bios can contain the option to run the ram at a different ratio than 1.1. However, that would have to be supported by the bios and the manufacturer of the board.
With a different ratio you could run the fabric speed at 1733 Mhz, and the memory at a different frequency tan 3466 Mhz. This would increase additional latency, but it would make it stable. It does not look like the R10 bios has that option.
I wish I had an R10 with your config to test this all out.
koshyjohn
18 Posts
1
March 12th, 2021 12:00
Ok, so Ryzen Master lets me adjust fabric clock in increments of 33 MHz (33.333... to be exact).
So I'm able to set it to exactly match the RAM clock (and the Alienware Command Center also detects the clock correctly - it cannot set it like Ryzen Master can but it can read it).
Running at 1,700 MHz, the event logs get flooded with warnings at the rate of 4 events per second. It's one every 30 seconds or so at 1,733 MHz (Alienware default, which is strange). And it's 0 failures at 1,600 Mhz.
Dell said they've raised this with engineering (private messages on this forum). I don't have any failures in Dell's health assessment (pre-boot or SupportAssist), so it's possible they are not catching this all at in pre-shipment health checks at the factory too. I've not started a formal support case but let's see if engineering gets back in some form.
I've not tried anything between 1,600 and 1,700 but I will to see if there's anything notable happening there. If there's anything else you want me to try out (without voiding the warranty), I can.
See screenshot for fabric running at 1,700 MHz:
koshyjohn
18 Posts
1
March 12th, 2021 12:00
As a further update to this, the highest fabric speed I can run at without issues is 1,633 MHz, which runs the RAM at 3,266 MHz. At 1,666 MHz, the warnings come in at about one every 2 to 3 seconds.
Thanks for your pointers in getting to the root of the issue. I'll still be available to try different things.
If I don't hear back from Dell in some reasonable time, I'm going to get an official support case started as this is not what I paid extra for on top of an already very expensive machine. I don't know if they never noticed this in engineering or if this issue just didn't happen in whatever engineering sample(s) they validated with before heading to production.
Vanadiel
6 Professor
•
7.1K Posts
2
March 12th, 2021 15:00
Based on your testing it seems the cut off point is 1633 Mhz. That gives you 3266 Mhz on the ram, which is below it's rated 3400 Mhz frequency.
I think we can rule out the ram at this point as the issue.
I would say at this point, I am going to lean towards the motherboard being the issue. The R10 is a new revision, but from other threads on this forum it is the same board.
Since pre Ryzen 5000 the board was certified for a fabric speed of 1600 Mhz, I am thinking that higher fabric speeds are not consistently possible on this board. Normally when you design a board, it becomes increasingly expensive to design it so it can handle higher frequencies.
You could also be extremely unlucky and have a processor that can only reach 1633 Mhz fabric speed. I doubt that would be the case.
Hopefully they get back to you with a solution, because that's not a cheap system.
If that is the case, no bios or software update can fix this issue as it would be an issue with the hardware design of the board.
What would be interesting to see is someone who has an R10 with a Ryzen 3000 processor and ram running at 3400 Mhz. Would be interesting to see their fabric clock. I have a feeling it is 1600 Mhz with a ratio so the memory can run at 3400 Mhz. Hopefully someone with that combination can post a screenshot of CPU-Z memory tab.
koshyjohn
18 Posts
1
March 13th, 2021 20:00
Another update on this:
- I was mistaken on the 1633 MHz being okay, the machine's only able to take a 1600 MHz fabric clock. Anything higher results in the System event logs being flooded with warnings at varying rates. This may be a limitation of the motherboard as you've hypothesized.
- So the good news, if it can be considered good, is that the RAM can be run separately at 1700 MHz or 1,733 MHz (both without issues) so long as the fabric clock is no higher than 1,600 Mhz. I'm not entirely sure what this means from a performance standpoint.
In any case, I've reverted everything to the Alienware default (1,733 MHz everything) so that Dell/Alienware addresses this properly. Warning are strewn across my event logs. I'll have a support case started next week, and hopefully it'll lead down a path that doesn't involve mindlessly swapping out components.
Vanadiel
6 Professor
•
7.1K Posts
1
March 14th, 2021 12:00
Running the ram with a divider that is different than 1:1, like Fabric clock at 1600 and ram at 1700, causes extra latency.
You might be better off latency wise running it at 1600 fabric speed and 1:1 on the ram, than running it at 1600 fabric speed and 3400 on the ram.
Best is to try and see what the ram latency is under each option. I use Aida64 to see various memory latencies under windows.
koshyjohn
18 Posts
1
March 14th, 2021 13:00
Good tip on AIDA64 to get hard numbers.. These are the results:
RAM / Fabric = Latency
1733 / 1733 = 71 ns (Alienware default, has issues)
1733 / 1600 = 91 ns (No issues)
1600 / 1600 = 102 ns (No issues)
Vanadiel
6 Professor
•
7.1K Posts
1
March 14th, 2021 15:00
Looks like best case working scenario is 1733 / 1600
Imagine if the 1733 / 1733 would work properly. It would almost shave off 30% in latency compared to 1600/1600.
Now you can see why AMD wants to increase the fabric speed.
Nzumbe
13 Posts
1
March 15th, 2021 06:00
Heya!
Just letting you know that we have the same issue. I'm also in contact with support about that and will you keep updated if anything comes up!
Good luck!
phpfreak
44 Posts
0
March 15th, 2021 18:00
Hi, you are not the only one having this random reboot. I also having this issue. However, I assume it has something to do with the heat at least for my case. 2 days ago I used a different nvme on the slot and replace the 500gb stock with 1 tb which is cheaper.My CPU is 5950x and GPU 6800XT
Last night, I decided to install it with Windows while planning to install Linux later (lack of drive and 3rd party pcie nvme/ssd not detected).
During the super fast fresh installation which took less than 10 mins to complete there was no reboot. However, around 5 minutes of used while trying to install driver for gpu it reboot itself. After 1 minute in it reboot again. So, I turned it off and leave it be for 30 minutes and tried again. It worked but few minutes after it reboot again. So that reboot is going on for several times either no load or while during stress test with Clock Turner and cinebench. I am using Core Temp to monitor the temperature. Everytime there is a spike of near 85 degree it reboot itself, I guess the system automatic safety.
I removed the pcie card since the mobo couldn't detect it and the 16tb hdd. I decided to let the case open while using it it worked for almost 2 hours and reboot. The second time last longer and around 4 hours usage before another reboot. I am giving up at this point.
Alienware diagnostic test says everything is running good while it was running with the open chassis. So that means there is nothing wrong with any components. It has to be the heat and the liquid cooler doesn't provide adequate cooling.
GPU temp and watt is low while being used. However, CPU heat
When case is closed:
1 - 50 minutes reboot
49-55 degree from cold boot average
65 degree and up after prolong before the 50 minutes used
~80 - 85 degree reboot
When case is opened:
~4 hours usage
37 to 40 degrees from cold boot
Reboot after around 4 hours used. I didn't manage to see the last cpu temperature reading.
Bios is updated to 2.1.1 early before the last reboot.
My only solution is to
1. Replace it with a new liquid cooler, dual fan
2. Move GPU and cpu to a new motherboard and chasis.
3. Return it and get a refund but that means I wouldn't able to get a new GPU for I don't know how long. So, I don't feel like to let it go.
I am also in the middle of trying to look for parts to build Ryzen server using server class motherboard ipmi or cheaper epyc cpu/mobo. This is not good to spend that much money and computer is not working out of the box.
phpfreak
44 Posts
0
March 15th, 2021 19:00
After removed the overclock capabilities with house ac on right now
The cpu is staying around high 40 to 56 degree since my last post (I am using my other system to type this). My conclusion is it is definitely the heat that causing the reboot. Of course, removing the overclock also cooling down the cpu. I will let it run whole night and update it here. Anybody know which liquid cooler will work?