this post was submitted on 18 Jun 2023
1318 points (98.5% liked)

Lemmy.World Announcements

29066 readers
2 users here now

This Community is intended for posts about the Lemmy.world server by the admins.

Follow us for server news ๐Ÿ˜

Outages ๐Ÿ”ฅ

https://status.lemmy.world

For support with issues at Lemmy.world, go to the Lemmy.world Support community.

Support e-mail

Any support requests are best sent to [email protected] e-mail.

Report contact

Donations ๐Ÿ’—

If you would like to make a donation to support the cost of running this platform, please do so at the following donation URLs.

If you can, please use / switch to Ko-Fi, it has the lowest fees for us

Ko-Fi (Donate)

Bunq (Donate)

Open Collective backers and sponsors

Patreon

Join the team

founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[โ€“] incognito_mode 3 points 1 year ago

Yeah, I understand that screen scraping is a thing, and having a robot just simply read an entire website means there's nothing you can do to stop that from happening short of taking the website offline.

I was talking about in a more structured and proactive way "We know that AI will read our site, and ingest that for LLM, instead of simply accepting that as an inevitability we're extending this offer instead, for a nominal fee we will provide them with the entirety of our sites information with all screen names redacted to protect the identity of the content creators, in exchange for them not simply using AI to read our site."

Or something to that effect. Accept that it will happen, and there's nothing you can really do to stop it. But to package the data in a clean way so that they don't have too, and can simply ingest it into the LLM data sets directly.