this post was submitted on 21 Oct 2024
27 points (100.0% liked)
Free Open-Source Artificial Intelligence
2895 readers
1 users here now
Welcome to Free Open-Source Artificial Intelligence!
We are a community dedicated to forwarding the availability and access to:
Free Open Source Artificial Intelligence (F.O.S.A.I.)
More AI Communities
LLM Leaderboards
Developer Resources
GitHub Projects
FOSAI Time Capsule
- The Internet is Healing
- General Resources
- FOSAI Welcome Message
- FOSAI Crash Course
- FOSAI Nexus Resource Hub
- FOSAI LLM Guide
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
A 2bit or 3bit quantization is quite some trade-off. At 2bit, it'll probably be worse then a smaller model with a lesser quantization. At the same effective size.
There is a sweet spot somewhere between 4 to 8 bit(?). And more than 8bit seems to be a waste, it seems indistinguishable from full precision.
General advice seems to be: Take the largest model you can fit at somewhere around 4bit or 5bit.
The official way to compare such things is calculate the perplexity for all of the options and choose the one with the smallest perplexity, that fits.
And by the way: I don't really use the tiny models like 3B parameters. They write text, but they don't seem to be able to store a lot of knowledge. And in turn they can't handle any complex questions and they generally make up a lot of things. I usually use 7B to 14B parameter models. That's a proper small model. And I stick to 4bit or 5bit quants for llama.cpp
Your graphics card should be able to run a 8B parameter LLM (4-bit quantized) I'd prefer that to a 3B one, it'll be way more intelligent.