this post was submitted on 20 Aug 2024
361 points (91.7% liked)

Technology

59713 readers
5909 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 2 points 3 months ago

It's a bit more complicated than that.

New models are sometimes targeting architecture improvements instead of pure size increases. Any truly new model still needs training time, it's just that the training time isn't going up as much as it used to. This means that open weights and open source models can start to catch up to large proprietary models like ChatGPT.

From my understanding GPT 4 is still a huge model and the best performing. The other models are starting to get close though, and can already exceed GPT 3.5 Turbo which was the previous standard to beat and is still what a lot of free chatbots are using. Some of these models are still absolutely huge though, even if not quite as big as GPT 4. For example Goliath is 120 billion parameters. Still pretty chonky and intensive to run even if it's not quite GPT 4 sized. Not that anyone actually knows how big GPT 4 is. Word on the street is it's a MoE model like Mixtral which run faster than a normal model for their size, but again no one outside Open AI actually can say with certainty.

You generally find that Open AI models are larger and slower. Wheras the other models focus more on giving the best performance at a given size as training and using huge models is much more demanding. So far the larger Open AI models have done better, but this could change as open source models see a faster improvement in the techniques they use. You could say open weights models rely on cunning architectures and fine tuning versus Open AI uses brute strength.