Technology

62087 readers

5812 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

524

DeepSeek Proves It: Open Source is the Secret to Dominating Tech Markets (and Wall Street has it wrong). (www.linuxfoundation.org)

submitted 1 day ago by [email protected] to c/technology

85 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 3 points 1 day ago (1 children)

Note that s1 is transparently a distilled model instead of a model trained from scratch, meaning it inherits knowledge from an existing model (Gemini 2.0 in this case) and doesn't need to retrain its knowledge nearly as much as training a model from scratch. It's still important, but the training resources aren't really directly comparable.

[–] [email protected] 0 points 1 day ago

True, but I'm of the belief that we'll probably see a continuation of the existing trend of building and improving upon existing models, rather than always starting entirely from scratch. For instance, you'll almost always see nearly any newly released model talk about the performance of their Llama version, because it just produces better results when you combine it with the existing quality of Llama.

I think we'll see a similar trend now, just with R1 variants instead of Llama variants being the primary new type used. It's just fundamentally inefficient to start over from scratch every time, so it makes sense that newer iterations would be built directly on previous ones.