this post was submitted on 12 Jun 2023
11 points (92.3% liked)

Lemmy.World Announcements

29110 readers
4 users here now

This Community is intended for posts about the Lemmy.world server by the admins.

Follow us for server news ๐Ÿ˜

Outages ๐Ÿ”ฅ

https://status.lemmy.world

For support with issues at Lemmy.world, go to the Lemmy.world Support community.

Support e-mail

Any support requests are best sent to [email protected] e-mail.

Report contact

Donations ๐Ÿ’—

If you would like to make a donation to support the cost of running this platform, please do so at the following donation URLs.

If you can, please use / switch to Ko-Fi, it has the lowest fees for us

Ko-Fi (Donate)

Bunq (Donate)

Open Collective backers and sponsors

Patreon

Join the team

founded 2 years ago
MODERATORS
 

One of the arguments made for Reddit's API changes is that they are now the go to place for LLM training data (e.g. for ChatGPT).

https://www.reddit.com/r/reddit/comments/145bram/addressing_the_community_about_changes_to_our_api/jnk9izp/?context=3

I haven't seen a whole lot of discussion around this and would like to hear people's opinions. Are you concerned about your posts being used for LLM training? Do you not care? Do you prefer that your comments are available to train open source LLMs?

(I will post my personal opinion in a comment so it can be up/down voted separately)

you are viewing a single comment's thread
view the rest of the comments
[โ€“] [email protected] 5 points 1 year ago (2 children)

I don't really care to be honest. If something's public on social media, it's public, and it's no longer on you to decide how it will be used. I really like the Stack Exchange policy that all posts are publicized under a Creative Commons license. Though they seem hell-bent on killing that, too.

[โ€“] FearTheCron 3 points 1 year ago (1 children)

Yeah, I think a creative commons style license makes sense and that was always my intent when posting things. However, when you post creative commons content, you do get to decide the restrictions (e.g. commercial or noncommercial).

I think its currently an open question how this applies to generative AI and LLMs. Perhaps the output of generative AI should retain the license of the training data? Or perhaps that is overly restrictive? There are those who believe that training commercial generative AI on data under permissive licenses is a problem.

https://www.theregister.com/2023/05/12/github_microsoft_openai_copilot/

https://slate.com/technology/2022/12/lensas-a-i-avatars-the-uncomfortable-places-their-magic-comes-from.html

I am not really sure where I stand on the overall issue. But the worst case scenario in my opinion is one where open source generative AI is hobbled by regulation paving the way for corporate control. My biggest fear about the Reddit API changes prevent anyone except Google, Facebook, Microsoft, Amazon, etc from using user comments as a training set.

[โ€“] [email protected] 3 points 1 year ago

I don't know either. I'll agree with you though that not restricting AI so that only big tech companies who have lots of lawyers can research it (and not release it) is the worst case scenario. And I fear that it's either that or complete dysregulation. OpenAI etc. just have too much money for lobbying, and given this is all happening in the US, which seems to be quite susceptible to monetary influence in politics, so I doubt any laws are gonna be passed to restrict them. Besides, there's the national interest in not letting China take the lead.