this post was submitted on 25 Jul 2023
37 points (97.4% liked)

Pop!_OS (Linux)

5188 readers
7 users here now

Pop!_OS is an operating system developed by System76 for STEM and creative professionals who use their computer as a tool to discover and create. Unleash your potential on secure, reliable open source software. Based on your exceptional curiosity, we sense you have a lot of it.

Unleash your potential

Whether this is your first experience with Linux, or your latest adventure, all are welcome to discuss and ask questions about Pop!_OS and COSMIC. Keep the discussions friendly though, and remember to assume good intentions whenever you reply. We're all here because we have a shared love for Linux and open source software.

System76 Logo

Support us by buying System76 hardware for you or your company! Or by donating on the Pop!_OS website through the "Support Pop" button. Pop!_OS and COSMIC are fully funded by System76 hardware sales. All systems are assembled in the USA. With your support, we'll work to push the Linux desktop forward with COSMIC.

Links

Guides

Hardware

Recommended

Community Rules

Follow the Code of Conduct

All posts on pop_os must adhere to the Pop!_OS community Code of Conduct. https://github.com/pop-os/code-of-conduct

Be helpful

Posts to pop_os must be helpful. When responding to a user asking for help, do not provide tongue-in-cheek responses like "RTM" or links to LMGTFY. Linking to direct sources that answer the asker's question is fine, but it's advised to provide some explanation as to how you got to that source.

Critique should be constructive

We within the Pop!_OS community welcome helpful criticism or ideas on ways to improve. However, basic "It's bad" or other simple negative comments don't help anyone fix anything. When voicing a complaint about something, try to point out ways the complaint could be improved or worked around, so that we can make a better product for it.

This rule applies to both Pop!_OS and its projects as well as other products available from third-parties.

Don't post malicious "advice"

It can be funny to joke about malicious commands, however this is not the venue for it. Do not advise users to run commands which will lock up their systems, steal their data, or erase their drive. Examples of this include (but are not limited to) fork bombs, rm, etc.

Posts violating this rule will be removed, even if the post is clearly in jest. Repeated offences may lead to a ban. You may understand that the command isn't serious, but a new user might not.

No personal attacks

Posts making a personal attack on any user will not be tolerated.

No hate speech

Hate speech of any kind will not be tolerated. Any violations will be removed, and are grounds for a ban.

founded 2 years ago
MODERATORS
37
submitted 1 year ago* (last edited 1 year ago) by mmstick to c/pop_os
 

Driver status

To check if you have a functioning driver, run nvidia-smi in a terminal. If the driver is functioning, it will actively report the GPU(s) it found on the system, and the version of the driver loaded.

$ nvidia-smi
Tue Jul 25 22:14:24 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.05              Driver Version: 535.86.05    CUDA Version: 12.2     |

How to reinstall

If this is not working, purge and reinstall the drivers on the system.

sudo apt purge ~nnvidia
sudo apt install nvidia-driver-535
sudo reboot

System doesn't have display on boot

Follow https://support.system76.com/articles/bootloader/ and repeat above step

Freezes and suspend/resume issues

These are most typically related to power management. You can attempt to partially rule this out by disabling PCIE active state power management by disabling it in the firmware, or using the pcie_aspm=off kernel boot option. You would ideally want this on to conserve energy and reduce heat.

Use sudo kernelstub -a {{OPTION}} to add boot options, and sudo kernelstub -d {{OPTION}} to remove.

Some systems have fatal errors when the CPU migrates to a low power state, which can be limited with the processor.max_cstate or intel_idle.max_cstate kernel boot parameters. A value of processor.max_cstate=0 disables it entirely, which will similarly cause higher energy drain and heat. If it resolves the problem, incrementally raise it until the issue reoccurs.

If you're certain that the issue is caused by the NVIDIA driver, you can try out different driver options by creating a file in /etc/modprobe.d/, such as a hypothetical /etc/modprobe.d/zz-nvidia.conf.

Some of these are automatically generated by system76-power when switching between graphics modes. So if you are manually setting these, be wary that these can conflict with different modes, or the system76-power.conf will override your settings if your file's name comes alphabetically before it.

All systems should have at least this defined, unless you are using the NVIDIA dGPU only for compute.

options nvidia-drm modeset=1

For hybrid graphics laptops, it will be necessary to define these

blacklist i2c_nvidia_gpu
alias i2c_nvidia_gpu off
options nvidia NVreg_DynamicPowerManagement=0x02

However, if the hardware has issues with GC6, change DynamicPowerManagement to

options nvidia NVreg_DynamicPowerManagement=0x01

Also, systems with issues after resuming from S3 suspend may require

options nvidia NVreg_PreserveVideoMemoryAllocations=1

In an absolute worst case scenario where suspend totally broken, you can try disabling these

sudo systemctl disable --now nvidia-hibernate.service nvidia-resume.service nvidia-suspend.service

But remember to undo these changes when there are new driver updates to check and see if the new driver has resolved these issues for your system.

Bad multi-monitor performance

Open nvidia-settings and enable "Force Full Composition Pipeline" on all monitors. Disable "Sync to VBlank" and "Allow Flipping" in the OpenGL settings. Edit /etc/environment and set this to your highest supported refresh rate. If it is 144 Hz on video output DP-1, you would set:

CLUTTER_DEFAULT_FPS=144
__GL_SYNC_DISPLAY_DEVICE=DP-1
__GL_SYNC_TO_VBLANK=0

High energy consumption

Powerful graphics cards may lean more aggressively to performance than energy efficiency by default. You can monitor theoretical energy consumption by running nvidia-smi dmon in a terminal. The pwr column guesses the watts used by the GPU.

These settings will not persist across reboots.

If you want a power limit of 100 watts, you can set that with sudo nvidia-smi -pl 100. Use nvidia-smi -q -d POWER to get the min and max power limit.

On my desktop RTX 3080 graphics card, this would drop energy consumption while watching a 1080p video on YouTube from 110-125W to 99W.

To further restrict energy consumption, an upper limit for graphics and memory clocks can be set. Use nvidia-smi -q -d CLOCK to get the maximum clocks. Then set a desired range for graphics clocks with sudo nvidia-smi -lgc {{MIN}},{{MAX}}, and a desired range for memory clocks with sudo nvidia-smi -lmc {{MIN}},{{MAX}}. Note that the NVIDIA driver may not honor the exact values you define.

By forcing minimum clocks as below, that same YouTube video drops it to 46W despite no perceivable difference.

sudo nvidia-smi -lgc 0,210
sudo nvidia-smi -lmc 0,405

I found a workaround

Do share what solutions you've found for your hardware, and the graphics model that was affected. For laptops, it would be useful for us and others to share the DMI IDs of the affected system. DMI IDs can be be helpful for those searching the web for issues with their laptop, and can also be used by system76-power to automatically apply known workarounds for known-affected systems.

You can run this script in a terminal to print DMI info:

for dmi_file in /sys/devices/virtual/dmi/id/*_{name,version}; do
    echo $dmi_file; echo -n '  '; cat $dmi_file
done
top 4 comments
sorted by: hot top controversial new old
[–] retiolus@lemmy.cat 1 points 1 year ago

Worked for me!

[–] RickRussell_CA 1 points 1 year ago (1 children)

My laptop (HP Omen Intel i7 + Nvidia 2060) lost the discrete graphics after the PopOS updater installed Nvidia driver 535.

I was able to fix it by re-installing driver 470. However, I'm still having problems with the discrete graphics going dark after screen blank or suspend.

The above info is interesting, and I'd love to use it to fix my issues, but even as something of a UNIX/Linux power user, I have trouble parsing the jargon.

What does this do?

sudo apt purge ~nnvidia

I don't understand the use of the tilde and double-n notation. I know that tilde is used as a home directory shortcut, but that's not how it's used here? I haven't been able to Google anything on it either, none of the other apt purge examples I found are using this notation.

I definitely have issues with the graphics failing to wake up after suspend. What do the hybrid graphics commands do? With respect to GC6 and Suspend S3, how would I know whether I need to do anything about those? I understand that they are some kind of power saving modes, but how would I know whether they are causing problems?

I'd love to be able to use the latest drivers & for suspend to work right, but I have to admit I'm out of my depth.

[–] RickRussell_CA 1 points 1 year ago* (last edited 1 year ago)

Obligatory hardware details:

root@pop-os:/home/rickr# for dmi_file in /sys/devices/virtual/dmi/id/*_{name,version}; do echo $dmi_file; echo -n ' '; cat $dmi_file done

    /sys/devices/virtual/dmi/id/board_name    878A
    /sys/devices/virtual/dmi/id/product_name    OMEN Laptop 15-ek0xxx
    /sys/devices/virtual/dmi/id/bios_version    F.14
    /sys/devices/virtual/dmi/id/board_version    17.29
    /sys/devices/virtual/dmi/id/chassis_version    Chassis Version
    /sys/devices/virtual/dmi/id/product_version