this post was submitted on 05 Feb 2025
516 points (97.6% liked)

Technology

66093 readers
8599 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] Kompressor 27 points 1 month ago (15 children)

Desperately trying tap in to the general trust/safety feel that open source software typically has. Trying to muddy the waters because they’ve proven they cannot be trusted whatsoever

[–] kava 6 points 1 month ago* (last edited 1 month ago) (14 children)

when the data used to train the AI is copyrighted, how do you make it open source? it's a valid question.

one thing is the model or the code that trains the AI. the other thing is the data that produces the weights which determines how the model predicts

of course, the obligatory fuck meta and the zuck and all that but there is a legal conundrum here we need to address that don't fit into our current IP legal framework

my preferred solution is just to eliminate IP entirely

[–] jacksilver 9 points 1 month ago (3 children)

I mean, you can have open source weights, training data, and code/model architecture. If you've done all three it's an open model, otherwise you state open "component". Seems pretty straightforward to me.

[–] kava 3 points 1 month ago (1 children)

Yes, but that model would never compete with the models that use copyrighted data.

There is a unfathomably large ocean of copyrighted data that goes into the modern LLMs. From scraping the internet to transcripts of movies and TV shows to tens of thousands of novels, etc.

That's the reason they are useful. If it weren't for that data, it would be a novelty.

So do we want public access to AI or not? How do we wanna do it? Zuck's quote from article "our legal framework isn't equipped for this new generation of AI" I think has truth to it

[–] jacksilver 3 points 1 month ago (1 children)

I mean using proprietary data has been an issue with models as long as I've worked in the space. It's always been a mixture of open weights, open data, open architecture.

I admit that it became more obvious when images/videos/audio became more accessible, but from things like facial recognition to pose estimation have all used proprietary datasets to build the models.

So this isn't a new issue, and from my perspective not an issue at all. We just need to acknowledge that not all elements of a model may be open.

[–] kava 1 points 1 month ago

So this isn’t a new issue, and from my perspective not an issue at all. We just need to acknowledge that not all elements of a model may be open.

This is more or less what Zuckerberg is asking of the EU. To acknowledge that parts of it cannot be opened. But the fact that the code is opened means it should qualify for certain benefits that open source products would qualify for.

load more comments (1 replies)
load more comments (11 replies)
load more comments (11 replies)