this post was submitted on 23 Jun 2023

26 points (93.3% liked)

Linux

50208 readers

1349 users here now

From Wikipedia, the free encyclopedia

Linux is a family of open source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991 by Linus Torvalds. Linux is typically packaged in a Linux distribution (or distro for short).

Distributions include the Linux kernel and supporting system software and libraries, many of which are provided by the GNU Project. Many Linux distributions use the word "Linux" in their name, but the Free Software Foundation uses the name GNU/Linux to emphasize the importance of GNU software, causing some controversy.

Rules

Posts must be relevant to operating systems running the Linux kernel. GNU/Linux or otherwise.
No misinformation
No NSFW content
No hate speech, bigotry, etc

Related Communities

Community icon by Alpár-Etele Méder, licensed under CC BY 3.0

founded 5 years ago

MODERATORS

[email protected]

Using an AMD GPU for NN training/inference? (self.linux)

submitted 2 years ago* (last edited 2 years ago) by leakybits to c/[email protected]

19 comments fedilink hide all child comments

I'm looking to buy a new GPU. My main use case will be training and running neural nets (tensorflow+pytorch); gaming isn't really a priority.

Thing is, I use wayland (via sway), and so I'd really prefer to get an AMD GPU. Nvidia doesn't seem very linux friendly at the moment, especially when it comes to wayland unfortunately.

On the other hand, Nvidia seems to be the clear frontrunner right now when it comes to NN acceleration. I'm worried that if I got an AMD GPU to accelerate my NN work, I'd just be wasting my money.

What do you all think?

Edit: I've used GPUs to accelerate NN models in the past, but they weren't my own, they were provided by my uni's research infra and/or google collab. So this would be the first time I'd be using my own GPU hardware for this purpose.

top 13 comments

sorted by: hot top controversial new old

[–] [email protected] 11 points 2 years ago* (last edited 2 years ago) (1 children)

Get something new enough and continue getting something new enough when AMD pushes them out. The drivers suck for anything older than an RX580, and things like Blender require even newer GPUs despite the hardware being more than capable.

Run Arch and use the ROCm'd PyTorch from the repos. Those packagers know what they're doing.

Other than that, expect everything premade to be made for CUDA (and therefore unusable). There are some tools like https://github.com/ROCm-Developer-Tools/HIPIFY but they aren't "there".

Source: Been running Stable Diffusion on an RX580.

[–] leakybits 3 points 2 years ago (1 children)

Thanks! Sounds doable but definitely frustrating... I'm surprised this is the state of things at the moment. I mean, when you buy a CPU, you don't really think about whether your choice limits you in some ways. But with a GPU, it's a big consideration.

[–] [email protected] 2 points 2 years ago

Yeah GPUs never got standardized like x86 did from the old IBM machine days. GPUs are still operating on the mindset of "specific hardware" rather than something generic. If GPUs could be programmed on as easily as CPUs we could target something like vulkan for ML.

Even ARM faces similar, but different problems of the lack of standard boot methods.

[–] [email protected] 4 points 2 years ago (1 children)

Checkout geohot's latest video about AMD GPUs... Not very favorable

[–] leakybits 2 points 2 years ago

Damn, that's a shame. I see he wrote about the messy state of the AMD drivers here, with a link out to the rant video I presume you're referring to. I'll take a look later, thanks.

[–] [email protected] 3 points 2 years ago* (last edited 2 years ago)

the unfortunate truth is that NVidia has a complete stranglehold on the compute market. They recognized the capabilities of massively parallel compute early on and pushed CUDA super super hard to any organization doing compute. And it worked- CUDA is much easier to implement than openCL, and was released two years earlier too, so everyone ended up standardizing on it. They are currently reaping the benefits of that monopolization through their now huge enterprise GPGPU market and can basically piss down the backs of consumers and competitors alike without repercussions. OpenCL and AMD's implementation was a day late and harder to implement...

Do not buy AMD if you need to do any kind of compute- whether it be rendering ala Blender/AE, accelerated engineering CAD workflows, or big data handling. No tools are designed around anything but CUDA, and it sucks because Jensen is a greedy asshole, but you gotta pay your dues.

[–] MigratingtoLemmy 2 points 2 years ago (1 children)

Hi, I do not know much about GPUs and ML. My apologies for not being able to answer your question, but I'd like to know what you're trying to achieve running said models. Is ML a hobby of yours?

[–] leakybits 2 points 2 years ago (1 children)

Cheers for the reply. I'm doing a masters in machine intelligence, so I work with various kinds of ML models. And yeah it's a hobby too, I like playing around with LLMs and seeing what I can do with them.

[–] [email protected] 1 points 2 years ago* (last edited 2 years ago)

Edit: Wrote this on mobile. The mobile U/I is not always clear as to the source magazine where the post came from, so I missed the Linux in there. Things are not as dire on Linux as on Windows for AMD, so my assessment may be a bit pessimistic. With AMD's focus on the data centre for machine learning, the linux driver stack seems fairly well supported.

I spent the last few days getting stable defusion and pytorch working on my Radeon 6800 XT in windows. The machineml distribution of stable diffusion runs at about 1/4 of the speed of raw rocm when I compare it to the shark tooling, which supports rocm via docker on windows.

Expect tooling to be clinky and that you will need to compile everything yourself on linux. Prebuilt stuff will all be for Nvidia.

Amd is pushing hard into the ai space, but aiming at datacenter users. They are rumoured to be building rocm for their windows drivers, but when that will ship is anyone's guess.

So right now, if you need to hit the ground running for your academic work, I would recommend NVidia, as much as it pains me, a long time AMD user.

[–] [email protected] 2 points 2 years ago* (last edited 2 years ago)

stable diffusion runs great on a 7900xtx via pytorch and rocm5.5, but you may have to compile pytorch 2.0.1. manually. but with pytorch/rocm:latest docker this is fairly easy, look for instructions to install automatic1111, they can be generally applied to other stacks

[–] [email protected] 2 points 2 years ago

I was in the same boat as you, i.e. using the GPU during my studies. My premise is to optimise the most frequent use case, i.e., deep learning.

IMO going with NVIDIA will save you so much worries and frustration that it clearly outweighs the downsides of worse Wayland support compared with AMD.

When you have tough university assignments/projects, you want to focus on the actual problem instead of debugging/compiling libraries for use with AMD. I am sure that with a bit of work many libraries can be made to work with AMD, but apparently it is still a pain oftentimes.

So I strongly suggest choosing NVIDIA. Disclaimer: have not used AMD for deep learning yet, but have monitored the development of AMD support, because I would like to switch to AMD.

Btw. I found Pop!OS to be very nice for both "regular" university work and all computer science tasks.

[–] mrufrufin 1 points 2 years ago

A note about this, it looks like Hugging Face is partnering with AMD to improve this situation but it looks like they're starting with enterprise first: https://huggingface.co/blog/huggingface-and-amd . No specific platforms listed on that page but https://huggingface.co/amd mentions Linux. But yeah, looks like the situation as of right now is mostly about NVIDIA and I've heard about CUDA going through classes at school.

[–] [email protected] 1 points 2 years ago

https://are-we-gfx1100-yet.github.io

load more comments