this post was submitted on 14 Jun 2023
29 points (100.0% liked)

Linux

48652 readers
1226 users here now

From Wikipedia, the free encyclopedia

Linux is a family of open source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991 by Linus Torvalds. Linux is typically packaged in a Linux distribution (or distro for short).

Distributions include the Linux kernel and supporting system software and libraries, many of which are provided by the GNU Project. Many Linux distributions use the word "Linux" in their name, but the Free Software Foundation uses the name GNU/Linux to emphasize the importance of GNU software, causing some controversy.

Rules

Related Communities

Community icon by Alpár-Etele Méder, licensed under CC BY 3.0

founded 5 years ago
MODERATORS
 

Recently my kernel started to panic every time I awoke my monitors from sleep. This seemed to be a regression; it worked one day, then I received a kernel upgrade from upstream, and the next time I was operating my machine it would crash when I came back to it.

After being annoyed for a bit, I realized this was a great time to learn how to bisect the git kernel, find the problem, and either report it upstream, or, patch it out of my kernel! I thought this would be useful to someone else in the future, so here we are.

Step #1: Clone the Kernel; I grabbed Linus' tree from https://github.com/torvalds/linux with git clone [email protected]:torvalds/linux.git

Step #2: Start a bisect.

If you're not familiar with a bisect, it's a process by which you tell git, "this commit was fine", and "this commit was broken", and it will help you test the commits in-between to find the one that introduced the problem.

You start this by running git bisect start, and then you provide a tag or commit ID for the good and the bad kernel with git bisect good ... and git bisect bad ....

I knew my issue didn't occur on the 5.15 kernel series, but did start with my NixOS upgrade to 6.1. But I didn't know precisely where, so I aimed a little broader... I figured an extra test or two would be better than missing the problem. 😬

git bisect start
git bisect good v5.15
git bisect bad master 

Step #3: Replace your kernel with that version

In an ideal world, I would have been able to test this in a VM. But it was a graphics problem with my video card and connected monitors, so I went straight for testing this on my desktop to ensure it was easy to reproduce and accurate.

Testing a mid-release kernel with NixOS is pretty easy! All you have to do is override your kernel package, and NixOS will handle building it for you... here's an example from my bisect:

boot.kernelPackages = pkgs.linuxPackagesFor (pkgs.linux_6_2.override { # (#4) make sure this matches the major version of the kernel as well
  argsOverride = rec {
    src = pkgs.fetchFromGitHub {
      owner = "torvalds";
      repo = "linux";
      # (#1) -> put the bisect revision here
      rev = "7484a5bc153e81a1740c06ce037fd55b7638335c";
      # (#2) -> clear the sha; run a build, get the sha, populate the sha
      sha256 = "sha256-nr7CbJO6kQiJHJIh7vypDjmUJ5LA9v9VDz6ayzBh7nI=";
    };
    dontStrip = true;
    # (#3) `head Makefile` from the kernel and put the right version numbers here
    version = "6.2.0";
    modDirVersion = "6.2.0-rc2";
    # (#4) `nixos-rebuild boot`, reboot, test.
  };
});

Getting this defined requires a couple intermediate steps... Step #3.1 -- put the version that git bisect asked me to test in (#1) Step #3.2 -- clear out sha256 Step #3.3 -- run a nixos-rebuild boot Step #3.4 -- grab the sha256 and put it into the sha256 field (#2) Step #3.5 -- make sure the major version matches at (#3) and (#4)

Then run nixos-rebuild boot.

Step #4: Test!

Reboot into the new kernel, and test whatever is broken. For me I was able to set up a simple test protocol: xset dpms force off to blank my screens, wait 30 seconds, and then wake them. If my kernel panicked then it was a fail.

Step #5: Repeat the bisect

Go into the linux source tree and run git bisect good or git bisect bad depending on whether the test succeeded. Return to step #3.

Step #6: Revert it!

For my case, I eventually found a single commit that introduced the problem, and I was able to revert it from my local kernel. This involves leaving a kernel patch in my NixOS config like this:

  boot.kernelPatches = [
    { patch = ./revert-bb2ff6c27b.patch; name = "revert-bb2ff6c27b"; }
  ];

This probably isn't the greatest long-term solution, but it gets my desktop stable and I'm happy with that for now.

Profit!

top 8 comments
sorted by: hot top controversial new old
[–] piranhaphish 3 points 2 years ago

Awesome writeup! Thank you.

I've always been intrigued by the concept of Nix but haven't had a chance to try it. And this is despite the fact that I had professionally built and maintained a custom LFS distro (later Gentoo, then more later Debian) and toyed with Yocto.

I'll try to find some time to give it a go. Also, even though it's probably not kernel or code related, you've inspired me to dig deeper into why my monitors continually wake for no reason.

[–] [email protected] 2 points 2 years ago

Unexpected advantage of NixOS here, I didn't realize you could bisect right with your package manager this way! I would have just downgraded back to the 5.15 releases.

Did you report the issue upstream? I'm sure they'd be happy with the faulty commit already being identified.

[–] [email protected] 2 points 2 years ago

I didn't know about git bisect. It will be useful for debugging in my repos

[–] [email protected] 2 points 2 years ago (1 children)

Very never knew about bisec

[–] [email protected] 2 points 2 years ago

Sorry very cool

[–] [email protected] 1 points 2 years ago

Really good information, saving this for the future.

[–] [email protected] 0 points 2 years ago (1 children)

@mfenniak regarding your “coworkers" comment. Wen could force squashing on all our repos that should at least for the majority of the time ensure all the commits in the mainline are compilable.

[–] [email protected] 1 points 2 years ago

Yeah... yeah, we could. Maybe we should. A couple hesitations -- (a) it makes merges and branches-of-branches difficult, and, (b) on a big PR you'd lose the ability to bisect into it. (b) is probably not a blocker because you'd have to have universally good commit hygiene to get the ability to do a rare thing -- cost vs. value doesn't align well. But (a) is a bit more of a headache.