this post was submitted on 28 Nov 2024

21 points (95.7% liked)

Selfhosted

41604 readers

589 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
No spam posting.
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
No trolling.

Resources:

selfh.st Newsletter and index of selfhosted software and apps
awesome-selfhosted software
awesome-sysadmin resources
Self-Hosted Podcast from Jupiter Broadcasting

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago

MODERATORS

Self hosting LLMs on a remote VPS (lemmy.dbzer0.com)

submitted 2 months ago by [email protected] to c/selfhosted

18 comments fedilink hide all child comments

Hi all, I'd like to hear some suggestions on self hosting LLMs on a remote server, and accessing said LLM via a client app or a convenient website. Either hear about your setups or products you got good impression on.

I've hosted Ollama before but I don't think it's intented for remote use. On the other hand I'm not really an expert and maybe there's other things to do like add-ons.

Thanks in advance!

top 18 comments

sorted by: hot top controversial new old

[–] just_another_person 17 points 2 months ago (2 children)

Do you have lots of money? Cuz that's going to cost lots of money. Just get a cheap GPU and run it locally.

[–] [email protected] 7 points 2 months ago* (last edited 2 months ago) (1 children)

That depends on the use-case. An hour of RTX 4090 compute is about $0.69 while the graphics card is like $1,600.00 plus computer plus electricity bill. I'd say you need to use it like 4000h+ to break even. I'm not doing that much gaming and AI stuff, so I'm better off renting some cloud GPU by the hour. Of course you can optimize that, buy an AMD card, use smaller AI models and pay for less VRAM. But there is a break even point for all of them which you need to pass.

[–] just_another_person -3 points 2 months ago (1 children)

Yes, but running an LLM isn't an on-demand workload, it's always on. You're paying for a 24/7 GPU instance if going that route over CPU.

[–] [email protected] 7 points 2 months ago* (last edited 2 months ago) (1 children)

Well, there's both. I'm with runpod and they bill me for each second I run that cloud instance. I can have it running 24/7 or 30min on-demand or just 20 seconds if I want to generate just one reply/image. Behind the curtains, it's Docker containers. And one of the services is an API that you can hook into. Upon request, it'll start a container, do the compute and at your option either shut down immediately, meaning you'd have payed like 2ct for that single request. Or listen for more requests until an arbitrary timeout is reached. Other services offer similar things. Or a fixed price per ingested or generated token with some other (ready-made) services.

[–] just_another_person 1 points 2 months ago (1 children)

Runpod is a container service. OP asked about remote server.

[–] [email protected] 3 points 2 months ago* (last edited 2 months ago) (1 children)

What's the difference regarding this task? You can rent it 24/7 as a crude webserver. Or run a Linux desktop inside. Pretty much everything you could do with other kinds of servers. I don't think the exact technology matters. It could be a VPS, virtualized with KVM, or a container. And for AI workloads, these containers have several advantages. Like you can spin them up within seconds. Scale them etc. I mean you're right. This isn't a bare-metal server that you're renting. But I think it aligns well with OP's requirements?!

[–] just_another_person -2 points 2 months ago (1 children)

Well I think the difference is what they asked about.

[–] [email protected] 1 points 2 months ago

Running an LLM can certainly be an on-demand service. Apart from training, which I don’t think we are discussing, GPU compute is only used while responding to prompts.

[–] [email protected] 1 points 2 months ago (1 children)

No, but I have free instance on Oracle Cloud and that's where I'll run it. If it's too slow or no good I'll stop using it but there's no harm trying.

[–] [email protected] 2 points 2 months ago (1 children)

I’d be interested to see how it goes. I’ve deployed Ollama plus Open WebUI on a few hosts and small models like Llama3.2 run adequately (at least as fast as I can read) on even an old i5-8500T with no GPU. Oracle Cloud free tier might work OK.

[–] [email protected] 2 points 2 months ago

then I'll let you know when I deploy it. didn't do it yet, might do it today, maybe later.

[–] [email protected] 9 points 2 months ago (1 children)

https://runpod.io

They also offer some templates, instructions and blog posts about this. And since I'm not advertising for a single company, there's also vast.ai and several others.

[–] [email protected] 2 points 2 months ago

thanks. I'll give it a try

[–] PumpkinEscobar 7 points 2 months ago (1 children)

Ollama and openwebui for a nice web interface.

[–] [email protected] 1 points 2 months ago

this looks very good I'll try this at home

[–] Bluefruit 5 points 2 months ago (1 children)

Me personally, i use my AMD 7700XT to run ollama on my main pc. Can be helpful for troubleshooting as the internet gets worse amd worse to search especially if i dont know what the issue is. Thats my main use case for it but id like to set up something with RAG and use it to help me with documentation if i have questions.

I don't think its super worth it to use a VPS for an LLM if you already have a decrnt gpu that you can run it on. If yoy dont already have the hardware, plenty of older gpus can run the models pretty well. My 1070ti still kicks ass all these years later and you can find them for $100 bucks or less on ebay used. Ive used it for ollama as well and it does just fine.

Will it be super fast or a really big model? No, but if its for personal use, i dont see any benefit to paying a monthly subscription for it and like i said, works well enough for me.

Its also more secure and private to host it yourself if thats worth anything to you. Thats one of the biggest reasons i self host.

[–] [email protected] 2 points 2 months ago (1 children)

If you do set up a RAG store, please post the tech stack you use as I’m in a similar situation. The inbuilt document store management in ollama+openwebui is a bit clunky.

[–] Bluefruit 2 points 2 months ago* (last edited 2 months ago)

If i can figure it out I'll be sure to post something lol.

So far, i found a python project that is supposed to enable RAG but i have yet to try it and after a reinstall of my linux pc to Popos, im having less than success getting ollama to run.