this post was submitted on 20 Aug 2023
32 points (94.4% liked)

lemmy.ml meta

1406 readers
1 users here now

Anything about the lemmy.ml instance and its moderation.

For discussion about the Lemmy software project, go to [email protected].

founded 3 years ago
MODERATORS
 

Some context about this here: https://arstechnica.com/information-technology/2023/08/openai-details-how-to-keep-chatgpt-from-gobbling-up-website-data/

the robots.txt would be updated with this entry

User-agent: GPTBot
Disallow: /

Obviously this is meaningless against non-openai scrapers or anyone who just doesn't give a shit.

you are viewing a single comment's thread
view the rest of the comments
[โ€“] [email protected] -2 points 1 year ago* (last edited 1 year ago) (1 children)

I can understand privacy concerns, but I feel like it's inevitable that LLMs will be used to make lots of decisions, some possibly important, so wouldn't you want some content included in its training? For instance, would you want an LLM to be ignorant of FOSS because all the FOSS sites blocked it, and then a child asks an LLM for advice on software and gets recommended Microsoft and Apple products only?

[โ€“] Geist_ 1 points 1 year ago* (last edited 1 year ago)

... It's probably going to recommend paid and non-FOSS apps and programs just on the basis that those companies probably will pay to be the top suggestions. Just like google ads. So no, I don't think that's a good enough reason. They can still scrape wiki's if they need info on FOSS sites, imo. Those shouldn't (?) block AI's and other aggregators.