this post was submitted on 28 Jan 2025
14 points (76.9% liked)

LocalLLaMA

2444 readers
24 users here now

Community to discuss about LLaMA, the large language model created by Meta AI.

This is intended to be a replacement for r/LocalLLaMA on Reddit.

founded 2 years ago
MODERATORS
 

Changed title because no need for youtube clickbait here

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 2 points 1 day ago

but we had the same thing with Alpaca, Llama2, Llama3, 3.2, Mistral, Phi…

I don't believe so, or at least, them all getting smaller and/or more intelligent isn't the point, it's how they did it

I noted above that if DeepSeek had access to H100s they probably would have used a larger cluster to train their model, simply because that would have been the easier option; the fact they didn’t, and were bandwidth constrained, drove a lot of their decisions in terms of both model architecture and their training infrastructure. Just look at the U.S. labs: they haven’t spent much time on optimization because Nvidia has been aggressively shipping ever more capable systems that accommodate their needs. The route of least resistance has simply been to pay Nvidia. DeepSeek, however, just demonstrated that another route is available: heavy optimization can produce remarkable results on weaker hardware and with lower memory bandwidth; simply paying Nvidia more isn’t the only way to make better models.

https://stratechery.com/2025/deepseek-faq/