this post was submitted on 02 Feb 2025
248 points (79.5% liked)

Technology

61368 readers
4484 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 380 points 1 day ago (6 children)

Edward Snowden doing GPU reviews? This timeline is becoming weirder every day.

[–] GamingChairModel 53 points 20 hours ago

"Whistleblows" as if he's some kind of NVIDIA insider.

[–] [email protected] 10 points 15 hours ago

I'll keep believing this is a theonion post

[–] [email protected] 11 points 16 hours ago (1 children)
[–] newcockroach 8 points 14 hours ago (1 children)

"Some hentai games are good" -Edward Snowden

[–] Siegfried 1 points 6 hours ago

Note that this is from 2003

[–] Winged_Hussar 90 points 1 day ago

Legitimately thought this was a hard-drive.net post

[–] eager_eagle 48 points 1 day ago (1 children)

I bet he just wants a card to self host models and not give companies his data, but the amount of vram is indeed ridiculous.

[–] [email protected] 24 points 1 day ago (2 children)

Exactly, I'm in the same situation now and the 8GB in those cheaper cards don't even let you run a 13B model. I'm trying to research if I can run a 13B one on a 3060 with 12 GB.

[–] [email protected] 14 points 1 day ago (3 children)

You can. I'm running a 14B deepseek model on mine. It achieves 28 t/s.

[–] levzzz 4 points 23 hours ago

You need a pretty large context window to fit all the reasoning, ollama forces 2048 by default and more uses more memory

[–] [email protected] 6 points 1 day ago

Oh nice, that's faster than I imagined.

[–] [email protected] 2 points 1 day ago (1 children)

I also have a 3060, can you detail which framework (sglang, ollama, etc) you are using and how you got that speed? i'm having trouble reaching that level of performance. Thx

[–] [email protected] 4 points 1 day ago* (last edited 17 hours ago) (2 children)

Ollama, latest version. I have it setup with Open-WebUI (though that shouldn't matter). The 14B is around 9GB, which easily fits in the 12GB.

I'm repeating the 28 t/s from memory, but even if I'm wrong it's easily above 20.

Specifically, I'm running this model: https://ollama.com/library/deepseek-r1:14b-qwen-distill-q4_K_M

Edit: I confirmed I do get 27.9 t/s, using default ollama settings.

[–] [email protected] 2 points 17 hours ago

Ty. I'll try ollama with the Q-4-M quantization. I wouldn't expect to see a difference between ollama and SGlang.

[–] [email protected] 2 points 22 hours ago

Thanks for the additional information, that helped me to decide to get the 3060 12G instead of the 4060 8G. They have almost the same price but from what I gather when it comes to my use cases the 3060 12G seems to fit better even though it is a generation older. The memory bus is wider and it has more VRAM. Both video editing and the smaller LLMs should be working well enough.

[–] [email protected] 4 points 23 hours ago

I'm running deepseek-r1:14b on a 12GB rx6700. It just about fits in memory and is pretty fast.

[–] [email protected] 1 points 13 hours ago

Does he work for Nvidia? Seems out of character for him.