this post was submitted on 06 Sep 2024
11 points (86.7% liked)

techsupport

2468 readers
5 users here now

The Lemmy community will help you with your tech problems and questions about anything here. Do not be shy, we will try to help you.

If something works or if you find a solution to your problem let us know it will be greatly apreciated.

Rules: instance rules + stay on topic

Partnered communities:

You Should Know

Reddit

Software gore

Recommendations

founded 1 year ago
MODERATORS
 

CPU: 3700X

Motherboard: Aorus B550 Elite

RAM: 8GBx4 Corsair Vengence LPX 3200

GPU: PowerColor 5700XT

PSU: Cooler Master MWE 1050 V2

Built in 2020.

Since last month, my PC started having random reboots and giving 'Machine Check Exception' error, similar to these:

https://old.reddit.com/r/AMDHelp/comments/190mkn0/5950x_whea_error_18_machine_check_exception/

https://old.reddit.com/r/AMDHelp/comments/qia2e7/whea_18_critical_error_computer_goes_black_restart/

https://old.reddit.com/r/buildapc/comments/150m14n/pc_randomly_restarts_whealogger_id_18/

And now from the last 3 days the system doesn't boot. When I power on the computer, all fans start spinning but keyboard and mouse LEDs don't light up. Pressing CTRL+ALT+DEL doesn't reboot system neither does pressing the power button for few seconds.

I suspect that motherboard has gone kaput and isn't completing or even starting the boot process, which is why keyboard and mouse aren't getting any signal or power from motherboard or why restart or power down functionality is working.

Before the system stopped booting, I was trying to solve the machine check exception error by updating BIOS, updating chipset drivers, changing BIOS settings etc. But now I'm thinking none of it could've helped because the board itself was deteriorating.

Also during that time, I would randomly get display glitches (pic below) which could only be solved by restarting the machine so I was suspecting it might be GPU that was causing the problems.

Sometimes it would show chessboard like pattern. I guess this was also because of some issue with mobo-GPU connection?

Anyway before changing the board is there anything else I can try? Changing it is a pain so I'm trying to avoid that. ๐Ÿ˜‚

you are viewing a single comment's thread
view the rest of the comments
[โ€“] vikingtons 3 points 2 months ago* (last edited 2 months ago) (1 children)

for whatever it's worth, my powercolor red devil 5700 died in a similar way to this several years ago. Got to a point where it failed to output any display signal at all.

Powercolor had an insane return rate for their NV10 GPUs, at least in Europe.

Are you getting WHEA logs here? Do they implicate a specific component?

[โ€“] TheBat 2 points 2 months ago (1 children)

for whatever it's worth, my powercolor red devil 5700 died in a similar way to this several years ago. Got to a point where it failed to output any display signal at all.

That's what I thought too but as I've mentioned, keyboard and mouse are not getting any power from motherboard.

If it was GPU issue, those would have active lights with only monitor not getting any signal from GPU.

Are you getting WHEA logs here? Do they implicate a specific component?

When the system still used to turn on properly, I'd get these in Windows' event viewer:

Reported by component: Processor Core

Error Source: Machine Check Exception

Error Type: Cache Hierarchy Error

Processor APIC ID: This kept changing

Provider Name: Microsoft-Windows-WHEA-Logger

Event ID: 18

When I started looking up information about this error I found out it could be caused by literally anything. Faulty CPU, board, memory, PSU. Sometimes setting CPU voltage around 1.30-1.35 helped. In one case the guy replaced his custom power cables with default ones and that solved the issue.

[โ€“] vikingtons 1 points 2 months ago* (last edited 2 months ago) (1 children)

Oh, cache hierarchy errors were decently common with Matisse too. I think this one may be your CPU my colleague hit this the other day with their ol' 3700X. I don't suppose you could RMA that?

[โ€“] TheBat 1 points 2 months ago (1 children)

That's going to be my last resort.

[โ€“] vikingtons 2 points 2 months ago (1 children)

I wouldn't trust that CPU at this point if you're reliably hitting that error code. Really sorry you're experiencing this. I hope you get a swift replacement.

[โ€“] TheBat 1 points 2 months ago (1 children)

Well I'd be actually baffled if it turns out to be CPU. Short of physical or electric damage, I have never heard of a chip going bad after 4 years.

[โ€“] vikingtons 2 points 2 months ago

it happens often enough. Hell we've had the whole raptor lake episode on Intel's side which wasn't really confirmed until well after release.

Could have been that the CPU was always impacted and degraded in some way over time to have that error code suddenly manifest.