this post was submitted on 29 Aug 2023
90 points (95.9% liked)

Linux

48352 readers
835 users here now

From Wikipedia, the free encyclopedia

Linux is a family of open source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991 by Linus Torvalds. Linux is typically packaged in a Linux distribution (or distro for short).

Distributions include the Linux kernel and supporting system software and libraries, many of which are provided by the GNU Project. Many Linux distributions use the word "Linux" in their name, but the Free Software Foundation uses the name GNU/Linux to emphasize the importance of GNU software, causing some controversy.

Rules

Related Communities

Community icon by Alpár-Etele Méder, licensed under CC BY 3.0

founded 5 years ago
MODERATORS
 

SOLUTION BELOW

The actual bug


I have never been in a more confusing situation regarding Linux.

I have a Dell XPS 15 9560, which had a dual boot Windows 10 / EndeavourOS setup. It was running fine for months. 10 days ago I updated Linux and after restart it couldn't boot anymore. It got stuck at "A start job is running for /dev/disk/by-uuid/..." (which is the root partition).

First, with the help of a friend of mine who is quite knowledgeable about Linux (he runs vanilla Arch, etc), we spent 5 hours trying to fix it but had no luck.

Then I decided to back up everything and do a fresh install. Aaaand the same error happened again on the first boot. Then I though "ok, probably some problem with Arch, lets try Fedora". Nope. Some similar error about not finding the root partition. (Here I must say that the kernel which was shipped with the ISO was working fine, but after updating to the latest one, it failed.) Here I thought "ok, then it might be a problem with the latest kernel, let's install EndeavourOS with the LTS kernel." Nope, LTS kernel also didn't boot. Then I tried Ubuntu and it worked, but that's not solving the problem. Then I decided to put another nvme drive in the laptop and try there. The same error again.

Now the greatest part: If I put the nvme drive into an external usb case, EndeavourOS installs, updates, boots without any problem, no sign of the error.

So now I don't know how to proceed... Maybe there is something wrong with the pcie port in my laptop, but except for the booting problem, windows is working, I can also mount and access every partition in the ssd through a live usb. So no other signs of problem with the port whatsoever.

I would be grateful for any advice as I've lost several days trying to solve this and I am out of ideas...


Solution: The last working kernels are from 11. August 2023 (both linux and linux-lts) linux-6.4.10.arch1-1 and linux-lts-6.1.45-1. You can download them from here: linux / linux-lts and install them with

sudo pacman -U the_path_to_the_package

Thank you all for the help!

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] -1 points 1 year ago* (last edited 1 year ago) (1 children)

What are the kernel parameters? cat /proc/cmdline

EDIT actually that will show the live system config. Assuming you're using grub, what is the content of /etc/default/grub?

Or /boot/loader/loaders/something.conf if you're on systemd-boot

[–] [email protected] 0 points 1 year ago (1 children)

I'm on systemd-boot. Where isn't a directory loaders under loader, but I found the parameters under /etc/kernel/cmdline:

nvme_load=YES nowatchdog rw root=UUID=9ae3c50f-be08-4594-ac30-2d094375868d

[–] [email protected] 0 points 1 year ago (1 children)

My bad, I think in your case it's in /efi/loader/entries/something.conf

Since / is not mounted, yet, bootloader will not be able to read anything under /etc/. Unless it's used to automatically populate the loader.conf.

Also check /efi/loader/loader.conf.

[–] [email protected] 1 points 1 year ago (1 children)

I found it!

[[email protected] ~]$ cat /mnt/efi/loader/entries/02ef85f9edc146d598502c1b296ff64a-6.4.12-arch1-1.conf 
# Boot Loader Specification type#1 entry
# File created by /etc/kernel/install.d/90-loaderentry.install (systemd 254.1-1-arch)
title      EndeavourOS
version    6.4.12-arch1-1
machine-id 02ef85f9edc146d598502c1b296ff64a
sort-key   endeavouros-6.4.12-arch1-1
options    nvme_load=YES nowatchdog rw root=UUID=9ae3c50f-be08-4594-ac30-2d094375868d systemd.machine_id=02ef85f9edc146d598502c1b296ff64a
linux      /02ef85f9edc146d598502c1b296ff64a/6.4.12-arch1-1/linux
initrd     /02ef85f9edc146d598502c1b296ff64a/6.4.12-arch1-1/initrd

[–] [email protected] 0 points 1 year ago (1 children)

I've never used machine-id with systemd-boot, but everything appears to be corrent. Presumably, /boot contains a directory named 6.4.12-arch1-1, which contains files linux and initrd, correct?

You could try rebuilding the initramfs with mkinitcpio --allpresets while chrooted.

[–] [email protected] 0 points 1 year ago* (last edited 1 year ago) (1 children)

they are under /02ef85f9edc146d598502c1b296ff64a/6.4.12-arch1-1/, but yes.

EndeavourOS is using dracut by default.

Edit: we tried rebuilding initramfs before, but it didn't help

[–] [email protected] 0 points 1 year ago (1 children)

OK, I see nothing wrong. Let's try building a new config that's as minimal as possible. Copy linux and initrd files to /boot/.

/efi/loader/entries/test.conf

title      Test
options    root=/dev/nvme0n1p2
linux      /linux
initrd     /initrd
[–] [email protected] 0 points 1 year ago (1 children)
[–] [email protected] 0 points 1 year ago (2 children)

I think failure to change power states is a big issue, but this is out of my depth now. Sorry :(

[–] [email protected] 1 points 1 year ago

It matches the observation with the external USB enclosure though. I think the ASPM / ACPI path would be the most promising.

If you know a last working and a first broken kernel version, maybe do a bisection.

[–] [email protected] 0 points 1 year ago

Thank you for your time and patience!