this post was submitted on 26 Mar 2024

638 points (96.4% liked)

linuxmemes

22819 readers

778 users here now

Hint: :q!

Sister communities:

Community rules (click to expand)

1. Follow the site-wide rules

Instance-wide TOS: https://legal.lemmy.world/tos/
Lemmy code of conduct: https://join-lemmy.org/docs/code_of_conduct.html

2. Be civil

Understand the difference between a joke and an insult.

Do not harrass or attack users for any reason. This includes using blanket terms, like "every user of thing".

Don't get baited into back-and-forth insults. We are not animals.

Leave remarks of "peasantry" to the PCMR community. If you dislike an OS/service/application, attack the thing you dislike, not the individuals who use it. Some people may not have a choice.

Bigotry will not be tolerated.

These rules are somewhat loosened when the subject is a public figure. Still, do not attack their person or incite harrassment.

3. Post Linux-related content

Including Unix and BSD.

Non-Linux content is acceptable as long as it makes a reference to Linux. For example, the poorly made mockery of sudo in Windows.

No porn. Even if you watch it on a Linux machine.

4. No recent reposts

Everybody uses Arch btw, can't quit Vim, <loves/tolerates/hates> systemd, and wants to interject for a moment. You can stop now.

5. 🇬🇧 Language/язык/Sprache

This is primarily an English-speaking community. 🇬🇧🇦🇺🇺🇸

Comments written in other languages are allowed.

The substance of a post should be comprehensible for people who only speak English.

Titles and post bodies written in other languages will be allowed, but only as long as the above rule is observed.

Please report posts and comments that break these rules!

Important: never execute code or follow advice that you don't understand or can't verify, especially here. The word of the day is credibility. This is a meme community -- even the most helpful comments might just be shitposts that can damage your system. Be aware, be smart, don't remove France.

founded 2 years ago

MODERATORS

poopsmith

zephyr

rtxn

638

What is the most difficult problem that you have fixed in linux? (lemmy.world)

submitted 11 months ago by Waffelson to c/linuxmemes

164 comments fedilink hide all child comments

top 50 comments

sorted by: hot top controversial new old

[–] [email protected] 181 points 11 months ago (5 children)

I have two, one is actually complicated and one was so obtuse that I never would have figured it out in a million years:

Actually complicated: I still don't know how it happened, but somehow an update on Arch filled the boot partition with junk files, which then caused the kernel update to fail because of no disk space, which then kind of tanked the whole system. It took ages, but with a boot disk and chroot-ing back into the boot partition I eventually managed to untangle it all. I was determined to see it through and not reinstall.

Ridiculous: One day when using Ubuntu, the entire system went upside-down. As in, everything was working perfectly fine, but literally the screen was upside-down. After much Googling I had no luck figuring it out, then I accidentally found the solution - I'd plugged a PS4 controller into the USB on the laptop to charge it, and for some reason Ubuntu interpreted the gyroscope on the controller as "rotate the screen display" so when I moved it, the screen spun round. I only figured it out by accident when I plugged it back it and it spun back to normal lol.

[–] 0110010001100010 105 points 11 months ago (2 children)

Ridiculous: One day when using Ubuntu, the entire system went upside-down. As in, everything was working perfectly fine, but literally the screen was upside-down. After much Googling I had no luck figuring it out, then I accidentally found the solution - I’d plugged a PS4 controller into the USB on the laptop to charge it, and for some reason Ubuntu interpreted the gyroscope on the controller as “rotate the screen display” so when I moved it, the screen spun round. I only figured it out by accident when I plugged it back it and it spun back to normal lol.

LMAO what the fuck?

load more comments (2 replies)

[–] mojo_raisin 51 points 11 months ago

This deserves some sort of funniest Linux problem award.

[–] [email protected] 38 points 11 months ago

The controller thing is goddam hilarious

[–] [email protected] 20 points 11 months ago (2 children)

Ridiculous

I had a similar one. I had a usb-powered fan cooling pad that my laptop was sitting on. My laptop would randomly go into boot loops when I turn it on. I thought it was a grub issue so I always had my usb stick ready to re-install grub. Did some dusting one day and forgot to plug in the cooling fan, then the boot loop never happened again. Turns out it was the fan plugged into the usb that was causing it.

[–] foggy 13 points 11 months ago (1 children)

I think this is likely related to USB cables as power cables and USB ports/voltages.

I have seen a lamp completely fry a MacBook. I wouldn't be surprised to see something similar cause a boot loop.

load more comments (1 replies)

[–] [email protected] 9 points 11 months ago* (last edited 11 months ago) (3 children)

This is up there with the ~~redacted~~ (just looked it up it's called the 500-mile email)

load more comments (3 replies)

[–] [email protected] 74 points 11 months ago* (last edited 11 months ago) (6 children)

I manage a machine that runs both media transcodes and some video game servers.

The video game servers have to run in real-time, or very close to it. Otherwise players using them suffer noticeable lag.

Achieving this at the same time that an ffmpeg process was running was completely impossible. No matter what I did to limit ffmpegs use of CPU time. Even when running it at lowest priority it impacted the game server processes running at top priority. Even if I limited it to one thread, it was affecting things.

I couldn't understand the problem. There was enough CPU time to go around to do both things, and the transcode wasn't even time sensitive, while the game server was, so why couldn't the Linux kernel just figure it out and schedule things in a way that made sense?

So, for the first time I read up on how computers actually handle processes, multi-tasking and CPU scheduling.

As FFMPEG is an application that uses ALL available CPU time until a task is done, I came to the conclusion that due to how context switching works (CPU cores can only do one thing, they just switch out what they do really fast, but this too takes time) it was causing the system to fall behind on the video game processes when the system was operating with zero processing headroom. The scheduler wasn't smart enough to maintain a real-time process in the face of FFMPEG, which would occupy ALL available cycles.

I learned the solution was core pinning. Manually setting processes to run on certain cores of the CPU. I set FFMPEG to use only one core, since it doesn't matter how fast it completes. And I set the game processes to use all but that one core, so they don't accidentally end up queueing for CPU time on a core that doesn't have the headroom to allow the task to run within a reasonable time range.

This has completely solved the problem, as the game processes and FFMPEG no longer wait for CPU cycles in the same queue.

load more comments (6 replies)

[–] [email protected] 58 points 11 months ago* (last edited 11 months ago) (6 children)

I've found that the silliest desktop problems are usually the hardest to solve, and the "serious" linux system errors are the easiest.

System doesn't boot? Look at error message, boot from a rescue disk, mount root filesystem and fix what you did wrong.

Wrong mouse cursor theme in some Plasma applications, ignoring your settings? Some weird font rendering issue? Bang your head against a wall exploring various dotfiles and rc files in your home directory for two weeks, and eventually give up and nuke your profile and reconfigure your whole desktop from scratch.

[–] Vanshaj 16 points 11 months ago

I laughed so hard reading your comment. I totally agree.

[–] ccunix 12 points 11 months ago (4 children)

A couple of weeks ago I moved Firefox to one side. Window disappeared, but Firefox was still running "somewhere" on my desktop, but was not actually be rendered to the screen. Killing the process and relaunching just resulted in it be rendered to this weird black hole. Log out of gnome and log back in? Same! Reboot? Same!

Ended up deleting it's config folder and re-attaching to Firefox sync in order to have it working again. No idea what went wrong, nor will I ever most likely.

load more comments (4 replies)

[–] Hyrulian 38 points 11 months ago (2 children)

Around 2017 I spent three days on and off trying to diagnose why my laptop running elementary OS had no wifi support. I reinstalled the wifi drivers and everything countless times. It worked for many days initially then just didn't one day when I got on the laptop. Turns out I had accidentally flipped the wifi toggle switch while it was in my bag. I forgot the laptop had one. Womp womp.

[–] Hawke 8 points 11 months ago

Womp womp.

I used to bullseye womp rats in my T-16 back home, they’re not much bigger than 2 meters.

[–] [email protected] 8 points 11 months ago

I had a friend come over to my place to fix her laptops wifi. After about an hour searching for any setting in windows that i could have missed, i coincidentally found a forum where one pointed out this could be due to a hardware wifi switch...

[–] [email protected] 36 points 11 months ago (2 children)

Grub.

Seriously. Tha was some fat as shit because I didn't know what I was doing.

[–] nul9o9 9 points 11 months ago

I broke my bootloader fucking with uefi settings. I was in a panic for a few hours because I hadn't bothered to learn how that shit worked until then.

It sure was a relief when i got back into my system.

load more comments (1 replies)

[–] teft 32 points 11 months ago (2 children)

I once exited vim without having to look up the commands.

load more comments (2 replies)

[–] Treczoks 29 points 11 months ago (7 children)

My first Linux machine crashing. This was way before Redhat, Ubuntu, Arch, or OpenSUSE. This was installed from 60+ floppy disks on a 386-40 with 8MB of RAM.

This machine ran happily, but it crashed under heavy load. I checked out causing the load by using different applications, but could not nail it to a certain software. So the next thing I checked was the RAM. Memtest86 ran for a day without any problems. But the crashes still came. So I got the infrared camera from the lab to see if some hardware overheats. Nope, this went nowhere, either.

Then I tested the harddisk. Read test of the whole HD went without problems. I copied the data on a backup medium and did a write and read test by dd'ing /dev/zero over the whole disk, and then dd'ing the disk to /dev/null. Nothing did show up.

I reinstalled the Linux, and it crashed again. But this time, I noticed that something was odd with the harddisk. I added a second swap partition, disabled the first, and the machine ran without problems. Strange...

So I wrote a small program that tested the part of the disk occupied by the old swap space: Write data, read data, and log everything with timestamps. And there was the culprit: There was an area on the HD where I could write any data, but when I read blocks from that area, a) It took a very long time for the read, b) the blocks I read were containing all zero, regardless of what I had written, and worst of all c) there was no error indication whatsoever from the controller or drive. Down at the kernel level, the zeroed blocks were happily served by the HD with an "OK". And the faulty area was right in the middle of the original swap partition.

load more comments (7 replies)

[–] rowinxavier 23 points 11 months ago (2 children)

Working for a VoIP company in the early 2010s I rm -rf'd the /bin/ directory. As root. On a production server. On site.

I ended up booting from my phone (android app for iso booting) then manually coppied over the files from another machine. Chrooted and some stuff was broken but rebuilding from the package manager reinstalled everything that was missing. Got the system back up in around 40 mins after that colossal screw up. Good fun and a great learning experience. Honestly, my manager should not have had me doing anything on a root shell with no training.

load more comments (2 replies)

[–] [email protected] 20 points 11 months ago (1 children)

Around 2003-2004. I was still a bit of a Linux noob, just getting to grips with Gentoo.

Had two no-name WiFi adapters that weren't directly supported under Linux. Found some obscure forum thread that mentioned them, along with which lines in which source code driver to change to make these adapters work.

load more comments (1 replies)

[–] [email protected] 19 points 11 months ago (2 children)

Getting WiFi to work in 2003

[–] TooLazyDidntName 12 points 11 months ago

For me, it was getting WiFi to work in 2023

[–] [email protected] 10 points 11 months ago (2 children)

NDISWrapper: we're just gonna trick the Windows driver into thinking it's running on Windows and intercept the system calls.

That was certainly an era.

load more comments (2 replies)

[–] [email protected] 19 points 11 months ago (1 children)

Maybe this goes a bit deeper than the question intended, but I’ve made and shared two patches that I had to apply locally for years before they were merged into the base packages.

The first was a patch in 2015 for SDL2 to prevent the Sixaxis and other misbehaving controllers to not use uninitialized axes and overwrite initialized ones. Merged in 2018.

The second was a patch in the spring of 2021 for Xft to not assume all the glyphs in a monospaced font to be the same size. Some fonts have ligatures which are glyphs that represent multiple characters together, so they’re actually some multiple of the base glyph size. Merged in the fall of 2022.

load more comments (1 replies)

[–] [email protected] 16 points 11 months ago (3 children)

Fixed a typo in my /etc/fstab that prevented the NAS from mounting. I am a bear of little brain. But I'm also proof that you don't have to be some master hacker to successfully run Linux.

load more comments (3 replies)

[–] johannesvanderwhales 15 points 11 months ago* (last edited 11 months ago) (2 children)

Back in the day, I upgraded a Slackware install from kernel 1.3 to 2.0. That was a fucking adventure.

The fun part about back then was that if your machine wouldn't boot or if you couldn't get your modem or pppd working, you probably didn't have another internet connected device so you might have to drive somewhere with a computer...or try to figure it out through books.

load more comments (2 replies)

[–] T4V0 15 points 11 months ago (3 children)

Not a Linux problem per se, but I had a 128GB image disk in a unknown .bin format which belongs to a proprietary application. The application only ran on Windows.

I tried a few things but nothing except Windows based programs seemed able to identify the partitions, while I could run it in Wine, it dealt with unimplementend functions. So after a bit of googling and probing the file, it turns out the format had just a 512 bytes as header which some Windows based software ignored. After including the single block offset, all the tools used in Linux started working flawlessly.

load more comments (3 replies)

[–] [email protected] 15 points 11 months ago

Not Linux, but Solaris, back in the day.

We had a system with a mirrored boot disk. One of the disks failed. And we were unable to boot from the other, because the boot device in OBP (~BIOS) pointed to a device-specific partitIon. When we manually booted from the live device, it was lacking the boot sector code, and wouldn't boot. When we booted from CDROM, the partitions wouldn't mount because the virtual device mapping pointed to the dead drive.

This was a gas futures trading system, and rebuild wasn't an option. Restoring from backup woyld have lost four hours of trades, which would be an extreme last resort.

A coworker and I spent all night on the box. We had a whiteboard covered with every stage of the boot sequence broken down, and every redirection we needed to (a) boot and (b) repair the system. The issue started mid-afternoon, and we finally got it back up by around 6:30 am.

[–] 33550336 13 points 11 months ago

quit vim

[–] PropaGandalf 13 points 11 months ago (1 children)

cool, now find another distro

load more comments (1 replies)

[–] [email protected] 12 points 11 months ago (2 children)

I once broke my Ubuntu install by trying to convert it KDE Neon, that reinstalled half my packages and left it in an basically unusable state. I then un-broke the install while upgrading multiple Ubuntu releases, that reinstalled the other half as well. It actually worked, and I'm still using that install.

load more comments (2 replies)

[–] [email protected] 12 points 11 months ago

Nvidia driver fucking X in the arse without lube.

[–] cley_faye 11 points 11 months ago (2 children)

Removed the libc by hand, and restored the system to a usable state without turning it off and putting the file back on the FS from external source.

load more comments (2 replies)

[–] [email protected] 11 points 11 months ago (5 children)

Full kernel corruption after a botched sudo full-upgrade.

I got the wonderful "bailing out you are on your own" shit as well.

Read a guide online about a hail mary ext file system journal recovery protocol, I ran it, like most things without reading too deeply.

Kernel was successfully repaired, Kubuntu kept on truckin'

load more comments (5 replies)

[–] [email protected] 11 points 11 months ago

Making a Palm Pilot getting a live connection to the internet through an infrared connection (Red Hat Linux). That was circa 2004, and I spent 10 hours, all night on it.

[–] [email protected] 10 points 11 months ago* (last edited 11 months ago)

I did a partial system upgrade when installing nginx without upgrading the rest of my Arch system. One of the things it upgraded was libssl.

Turns out systemd depends on that.

Turns out programs won't start at all if one of their shared libraries is missing.

Turns out that if you write init=bash in the kernel command line, not even Ethernet connections work if systemd isn't running.

I had to boot off archiso, chroot into my / partition, and run the system upgrade from there.

[–] Limonene 10 points 11 months ago

A couple months ago, I made a Palworld server box out of a spare motherboard assembly (mobo, processor, ram) from a computer I had recently upgraded.

I didn't have any spare drives lying around, so I plugged in 7 USB flash drives and made them into a RAID array. Not a true RAID array, but a BTRFS filesystem with volumes spread onto each flash drive, with the data redundancy set to raid1, and the metadata redundancy set to raid1c3.

It worked... in the sense that I never lost any data. It certainly didn't work in the sense of having good uptime.

The first problem was getting it to boot right. The boot line in GRUB had "root=UUID=..." instead of a specific drive named. That is normal. However, in BTRFS multi-volume filesystems, all the volumes have the same UUID. So the initrd was only waiting for a single drive matching that UUID, then trying to mount it as the root filesystem. This failed, because the kernel had not yet set up the other 6 USB drives, and this BTRFS filesystem needs all 7 volumes present. Maybe 6, if you used the "degraded" mount option.

The workaround was to wait for this boot process to fail, at which point you get dropped into an initrd shell. Then, you look at all the drives and make sure they're all there. And then... I don't exactly remember what happened next. I think it was some black magic that erases your mind in the process. I somehow got it booted from the initrd shell.

Installing Steam and the Palworld server worked ok, and it even ran for a few hours before crashing overnight.

The next morning, I tried rebooting it. Unfortunately, the USB drives weren't all appearing. Turns out the motherboard had some bad USB ports, some sometimes-bad USB ports, and a maybe-bad PCIe bus, because the PCIe USB expansion card I plugged in had weird problem that it had never had before.

I found the most reliable ports and plugged the drives in there. But you can't just replug them in the initrd. It doesn't have USB hotplug support. So each time it tried to boot with not all the drives there, I restarted it again until one time I finally had all the drives.

I changed the GRUB boot line to "root=/dev/sdg1" . This made it wait for all the drives to load, in any order, and whichever one was last would be mounted as the root filesystem (but the kernel would automatically include all the others too, since they were successfully initialized).

The bad USB ports kept bringing down the server every day or two. I bought a cheap NVMe drive and added it to the BTRFS filesystem, and then removed all the USB drives except the largest. That fixed the reliability. It's been like that since.

Now, to boot the server, all I have to do is change the GRUB boot line to "root=/dev/sdb1" . Since the NVMe drive is much faster than the USB drive, it always initializes first. If the initrd waits for sdb2, then it will always have both drives initialized when it tries to mount the root filesystem.

I could add that to the grub.cfg, or come up with some other more permanent solution, but I'm not planning on rebooting this server ever again. My friends fell off Palworld, and I gave a shutdown date that's about a week away. And the electricity is pretty reliable here.

[–] JargonWagon 9 points 11 months ago (2 children)

Nothing. I've fixed nothing.

load more comments (2 replies)

[–] [email protected] 9 points 11 months ago* (last edited 11 months ago) (1 children)

More than a decade ago a user came into #ubuntu-server on Freenode (now libera.chat ) and said that they had accidentally run "rm -rf /* something*" in a root shell.

Note the errant space that made that a fatal mistake. I don't remember how far it actually got in deleting files, but all of /bin/ /sbin/ and /usr/ were gone.

He had 1 active ssh connection, and couldn't start another one.

It was a server that was "in production", was thousands of miles away from him, and which had no possibility for IPMI / remote hands.

Everyone (but me) in the channel said that he was just SoL and should just give up.

I stayed up most of the night helping him. I like challenges and I like helping people.

This was in the sysv-init (maybe upstart) days, and so a decent number of shell scripts were running, and using basic *nix commands.

We recovered the bash binary by running something along the lines of

bash_binary_contents="$( </proc/self/exe)"
printf "%s" > /tmp/bash

(If you can access "lsof" then "sudo lsof | grep deleted" will show you any files that are open, but also "deleted". You may be surprised at how many there are!)

But bash needed too many shared libraries to make that practical.

Somehow we were able to recover curl and chmod, after which I had him download busybox-static. From there we downloaded an Ubuntu LiveCD iso, loop mounted it, loop mounted the squashfs image inside the iso, and copied all of /bin/ , /sbin/ , /etc , and so on from there onto his root FS.

Then we re-installed missing packages, fixed up /etc/ (a lot of important daemons, including the one that was production critical, kept their configuration files open, and so we were able to use lsof to find the magic symlinks to them in /proc/$pid/fd/ and just cp them back into /etc/.

We were able to restart openssh-server, log in again, and I don't remember if we were brave enough to test rebooting.

But we fucking did it!

I am certainly getting a lot of details wrong from memory. It's all somewhere at irclogs.ubuntu.com though. My nick was / is Jordan_U.

I tried to find it once, and failed.

load more comments (1 replies)

[–] [email protected] 9 points 11 months ago

So I mostly fried the SSD by using it to write and rewrite ML checkpoints and logs, this in turn made the device read only and I somehow managed to migrate to a different SSD probably using clonezilla or something, but it messed up the bootloader so I installed refind in a new partition, configured it and voila it works. It's scary because you need to do everything without seeing your system even half alive anywhere along the process, but it's not actually hard, just copying data and installing/configuring a bootloader. But for a then 20year old at his more or less first job my head was on fire for the 1.5 days this took.

By far the most difficult single thing that I've ever had to fix that actually had to do with the system.

I now don't flood my SSDs with data that is constantly rewritten.

[–] [email protected] 8 points 11 months ago

Upgrading the system I removed glibc from the system (Debian). apt wasn't working, etc. Had to manually fix dependencies and everything. Currently my working OS so all fixed.

[–] poopsmith 8 points 11 months ago* (last edited 11 months ago)

Learned how drivers worked and fixed a driver for an USB to I2C chip. It's still buggy but at least it sorta works now.

Some more details: I was using a CH347 (USB to UART/SPI/I2C) and there was an open source driver that used a previous chip version. The original dev had hardcoded the bulk IO endpoints indices. The only change I had to do was just iterate over the endpoints and search for the correct ones. But at first, I didn't understand anything about how the USB subsystem worked and how drivers were loaded. All I could tell was the USB device was correctly detected but the I2C driver wasn't being loaded, despite proper udev rules, correct vendor/product IDs, etc.

[–] IzzyJ 8 points 11 months ago

This will feel extremely simple for some folks, but I was having a hell of a time getting Steam games that had previously worked through Proton running. I scoured the internet for solutions after trying to install proton-ge and testing multiple versions. Eventually someone had the galaxy brain idea to suggest installing WINE. For some reason, that fixed the problem real good.

[–] [email protected] 8 points 11 months ago* (last edited 11 months ago) (1 children)

This doesn't fit the question exactly but I feel it's in the same spirit, and a kind of interesting solution, I think.

Back in the early days of scryptcoin mining, I had a few gpu mining rigs running Linux. Occasionally they would hard lock and I'd have to power cycle them.

What I ended up doing is getting some usb to serial adapters, wrote a python script that ran on startup and would send a character over serial at a set interval in a loop. That was hooked up, if I recall correctly, to an attiny85 using softwareserial and some ttl to rs232 conversion. It would listen over serial and if it didn't receive anything with a reasonable time frame it'd flip a relay that cut mains power to the pc, then flipped it back. A deadman's switch, of a sort. It worked great!

[–] [email protected] 10 points 11 months ago (1 children)

I remember a story about someone who did something similar with a server that kept hanging. They rigged up a second computer to ping it over the local network and if there was no response for a certain amount of time, the computer would eject its CD-ROM tray which had been lined up neatly with the reset button on the server.

Since it couldn't eject fully, it then retracted, having rebooted the server.

I assume that was a temporary fix... and it was probably a Windows server tbh.

The closest I've done is having a job run every 12 hours checking if a process was over a certain memory usage (memory leak) and restarting it if it was. That was also Windows, but the same thing on Linux wouldn't have been difficult... not that the Linux servers ever had that problem.

load more comments (1 replies)

[–] bitchkat 8 points 11 months ago

Are you including back in the day when we had to use windows device drivers via ndiswrappers?

I've managed to remove a critical library once but did manage to extract it from an RPM on another machine and manually install it. That was good enough to get me to the point where I could yum reinstall.

Pre-linux we had an HP workstation where the disc drive died and of course we had no backups. I managed to frankenstein the disc by connecting the platters on the broken disc to the circuit board of a working disc. This worked and I was able to back up the disk and reload on to a new drive.

And then we bought an 8mm tape drive for backups and I had to port some drivers to HP-UX to get it to work. But we had awesome backups after that!

[–] [email protected] 7 points 11 months ago

Can't think of the most difficult problem, but I have managed to solve a lot of problems with btrfs snapshots.

[–] [email protected] 7 points 11 months ago

It's not the biggest issue I managed to fix, but it was definitely the hardest to figure out a fix for:

Whenever I would boot up any game on my Linux machine I would have microstutters ever so often, and it was frequent and lengthy enough to be very annoying, and thus started my 2 month long quest to figure out what was going wrong.

To cut a long story short, the compositor I was using had suddenly decided to do a breaking update and change the names of the backends they were using.

load more comments