this post was submitted on 03 Aug 2023
41 points (97.7% liked)

Selfhosted

40649 readers
360 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago
MODERATORS
 

Getting GPU acceleration working is a common task for those of us running Plex or Jellyfin. There is not much documentation for getting the NVIDIA container stack to work with Podman, even less on Gentoo, plus there have been a lot of changes to NVIDIA's container toolkit lately.

I have been fighting with Podman for a while now and just recently got it working 1:1 with my Docker setup. Gentoo may not be the most popular or easy to use distro but I documented it in case some poor soul runs across it searching the web.

Feel free to poke holes in it or leave feedback.

top 9 comments
sorted by: hot top controversial new old
[–] PriorProject 4 points 1 year ago (1 children)

This is a pretty awesome how-to. I knew nothing about containerizing GPU workloads before this, and it seems quite a lot less scary/involved than I feared.

FWIW, I think some of your DNS and general networking woes may be due to the macvlan setup rather than using netavark. Netavark seems like the golden path going forward for a batteries-included experience. Not that I have anything against macvlan, in many ways macvlan feels simplest and nicest for homelab setups and I've used it with LXC and other container runtimes in the past. But for the most docker-like "it just works" experience, I feel like netavark is getting the upstream love.

[–] [email protected] 3 points 1 year ago (1 children)

Thank you!

Agree, I really wanted netavark to work - it definitely seems like the way forward. I enabled it and out of the box none of my containers could resolve DNS, even though aardvark was running. It's so new I wasn't sure where to poke around, so I went with the legacy method.

I'll try again once it stabilizes in Gentoo, somebody else noticed netavark should be the default now and opened a bug with the maintainer.

[–] PriorProject 2 points 1 year ago

I enabled it and out of the box none of my containers could resolve DNS, even though aardvark was running.

I experienced this on Ubuntu as well, and addressed it by opening up a firewall rule on the network interface for my podman network allowing the ip-range of the podman network to issue DNS requests to the gateway-ip (which is where aardvark-dns sets up shop).

Also had to add a firewall rule to open whatever ports I exposed from all src-ips to the podman network range before exposing hostPorts would work.

Again, not critiquing the very capable macvlan setup, just sharing tips I've picked up on making netavark work.

[–] [email protected] 2 points 1 year ago

Thank you! Cross-posted this to my tiny Gentoo community at [email protected]

Feel free to share your Gentoo knowledge and experience there, it certainly could use more activity.

[–] [email protected] 2 points 1 year ago (1 children)

Great job! Maybe it would be worth putting it on the Gentoo wiki?

[–] [email protected] 2 points 1 year ago

Good idea, I'll work on bringing it up to the wiki guidelines this weekend.

[–] [email protected] 2 points 1 year ago* (last edited 1 year ago) (1 children)

Absolutely stellar write up. Thank you!

I have a couple of questions.
Imagine I have a powerful consumer gpu card to trow at this solution, 4090ti for the sake of example.
- How many containers can share one physical card, taking into account total vram memory will not be exceeded?
- How does one virtual gpu look like in the container? Can I run standard stuff like PyTorch, Tensorflow, and CUDA stuff in general?

[–] [email protected] 1 points 1 year ago (1 children)

Thanks!

As I understand it, it bind-mounts the /dev/nvidia devices and the CUDA toolkit binaries inside the container, giving it direct access just as if it was running on the host. It's not virtualized, just running under a different namespace so the VRAM is still being managed by the host driver. I would think the same restrictions exist in containers that would apply for running CUDA applications normally on the host. Personally I've had up to 4 containers run GPU processes at the same time on 1 card.

And yes, Nvidia hosts it's own GPU accelerated container images for PyTorch, Tensorflow and a bunch of others on the NGC. They also have images with the full CUDA SDK on their dockerhub.

[–] [email protected] 2 points 1 year ago

That's wonderful to know! Thank you again.
I'll follow your instructions, this implementation is exactly what I was looking for.