this post was submitted on 21 Oct 2024
27 points (100.0% liked)
Free Open-Source Artificial Intelligence
2895 readers
1 users here now
Welcome to Free Open-Source Artificial Intelligence!
We are a community dedicated to forwarding the availability and access to:
Free Open Source Artificial Intelligence (F.O.S.A.I.)
More AI Communities
LLM Leaderboards
Developer Resources
GitHub Projects
FOSAI Time Capsule
- The Internet is Healing
- General Resources
- FOSAI Welcome Message
- FOSAI Crash Course
- FOSAI Nexus Resource Hub
- FOSAI LLM Guide
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I used to have a 6GB GPU, and around 7B is the sweetspot. This is still the case with newer models, you just have to pick the right model.
Try a IQ4 quantization of Qwen 2.5 7B coder.
Below 3bpw is where its starts to not be worth it, since we have so many open weights availible these days. A lot of people are really stubbern and run 2-3bpw 70B quants, but they are objectively worse than a similarly trained 32B model in the same space, even with exotic, expensive quantization like VPTQ or AQLM: https://huggingface.co/VPTQ-community
Is this VPTQ similar to that 1.58Q I've heard about? Where they quantized the Llama 8B down to just 1.5 Bits and it somehow still was rather comprehensive?
No, from what I've seen it falls off below 4bpw (just less slowly than other models) and makes ~2.25 bit quants somewhat usable instead of totally impractical, largely like AQLM.
You are thinking of bitnet, which (so far, though not after many tries) requires models to be trained from scratch that way to be effective.