this post was submitted on 27 Nov 2024
205 points (93.2% liked)
Firefox
18099 readers
38 users here now
A place to discuss the news and latest developments on the open-source browser Firefox
founded 5 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Thing is, for your average user with no GPU and whp never thinks about RAM, running a local LLM is intimidating. But it shouldn't be. Any system with an integrated GPU, and the more RAM the better, can run simple models locally.
The not so dirty secret is that ChatGPT 3 vs 4 isn't that big a difference, and neither are leaps and bounds ahead of the publically available models for about 99% of tasks. For that 1% people will ooh and aah over it, but 99% of use cases are only seeing marginal gains on 4o.
And the simplified models that run "only" 95% as well? They can use 90% fewer resources give pretty much identical answers outside of hyperspecific use cases.
Running a a "smol" model as some are called, gets you all the bang for none of the buck, and your data stays on your system and never leaves.
I've been yelling from the rooftops to some stupid corporate types that once the model is trained, it's trained. Unless you are training models yourself, there is no need for the massive AI clusters, just for the model. Run it local on your hardware at a fraction of the cost.
Idk I noticed pretty significant differences between models of various sizes. I mean there are lots of metrics on this
https://www.vellum.ai/llm-leaderboard