this post was submitted on 26 Jul 2024
28 points (76.9% liked)

Selfhosted

40860 readers
640 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago
MODERATORS
 

Thx in advice.

all 18 comments
sorted by: hot top controversial new old
[–] fhein 15 points 5 months ago* (last edited 5 months ago)

For LLMs it entirely depends on what size models you want to use and how fast you want it to run. Since there's diminishing returns to increasing model sizes, i.e. a 14B model isn't twice as good as a 7B model, the best bang for the buck will be achieved with the smallest model you think has acceptable quality. And if you think generation speeds of around 1 token/second are acceptable, you'll probably get more value for money using partial offloading.

If your answer is "I don't know what models I want to run" then a second-hand RTX3090 is probably your best bet. If you want to run larger models, building a rig with multiple (used) RTX3090 is probably still the cheapest way to do it.

[–] [email protected] 13 points 5 months ago

"Bang for Buck"

Good luck. I would wait for the AI phase to crash

[–] maxwellfire 11 points 5 months ago

I feel like this really depends on what hardware you have access too. What are you interested in doing?How long are you willing to wait for it to generate, and how good do you want it to be?

You can pull off like 0.5 word per second of one of the mistral models on the CPU with 32GB of RAM. The stabediffusion image models work okay with like 8-16GB of vram.

[–] [email protected] 8 points 5 months ago (2 children)

Buy the cheapest graphics card with 16 or 24GB of VRAM. In the past people bought used NVidia 3090 cards. You can also buy a GPU from AMD, they're cheaper but ROCm is a bit more difficult to work with. Or if you own a MacBook or any Apple device with a M2 or M3, use that. And hopefully you paid for enough RAM in it.

[–] thirdBreakfast 7 points 5 months ago* (last edited 5 months ago) (1 children)

An M1 MacBook with 16GB cheerfully runs llama3:8b outputting about 5 words a second. A second hand MacBook like that probably costs half to a third of a secondhand RTX3090.

It must suck to be a bargain hunting gamer. First bitcoin, and now AI.

edit: a letter

[–] [email protected] 5 points 5 months ago (1 children)

Patient gamers at least have the steam deck option now

[–] [email protected] 2 points 5 months ago (1 children)

Ok. I get it now. I've been trying to build something cheap as a Linux gaming setup and I've come to the conclusion that I'm better off buying the steam deck.

[–] [email protected] 1 points 5 months ago (1 children)

I think an older Ryzen and an RX590 can be had for decent prices, no?

[–] [email protected] 1 points 5 months ago

Yeah, but the form factor of the steam deck makes it more appealing if I want to set It up in the living room

[–] [email protected] 5 points 5 months ago (1 children)

I actually use an AMD card for running image generation and LLMs on my PC on Linux. It's actually not hard to set up.

[–] s38b35M5 2 points 5 months ago (2 children)
[–] [email protected] 4 points 5 months ago (1 children)

I'm not the original person you replied to, but I also have a similar setup. I'm using a 6700XT, with both InvokeAI and stable-diffusion-webui-forge setup to run without any issues. While I'm running Arch Linux, I have it setup in Distrobox so its agnostic to the distro I'm running (since I've hopped between quite a few distros) - the container is actually an Ubuntu based container.

The only hiccup I ran into is that while ROCm does support this card, you need to set an environmental variable for it to be picked up correctly. At the start of both sd-webui and invokeai's launch scripts, I just use:

export HSA_OVERRIDE_GFX_VERSION=10.3.0

In order to set that up, and it works perfectly. This is the link to the distrobox container file I use to get that up and running.

[–] s38b35M5 2 points 5 months ago

Thx. I'm dabbling rn with a 2015 Intel i5 SFF and a low profile 6400 GPU, but it looks like I'll be getting back to all my gear soon, and was curious to see what others are having success running with.

I think I'm looking at upgrading to a 7600 or greater GPU in a ryzen 7, but still on the sidelines watching the ryzen 9k rollout.

I still haven't tried any image generation, have only used llamafile and LM studio, but would like to did a little deeper, while accounting for my dreaded ADHD that makes it miserable to learn new skills...

[–] [email protected] 2 points 5 months ago

I have Fedora installed on my system (don't know how the situation is on other distros regarding rocm) and my GPU is an RX 6700 XT. For image generation I use stable duffusion webui and for LLMs I use text generation webui. Both installed everything they needed by themselves and work perfectly fine on my AMD GPU. I can also give you more info if there's anything else you wanna know.

[–] [email protected] 7 points 5 months ago* (last edited 5 months ago)

Automatic1111 for Stable Diffusion and Ollama for LLMs

[–] [email protected] 4 points 5 months ago* (last edited 5 months ago) (1 children)

KobaldCPP or LocalAI will probably be the easiest way out of the box that has both image generation and LLMs.

I personally use vllm and HuggingChat, mostly because of vllm's efficiency and speed increase.

[–] [email protected] 3 points 5 months ago

It is probably dead but Easy Diffusion is imo the easiest for image generation.

KoboldCPP can be a bit weird here and there but was the first thing that worked for me for local text gen + gpu support.