this post was submitted on 09 Oct 2024
827 points (99.9% liked)

Technology

59542 readers
4249 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] 7fb2adfb45bafcc01c80 0 points 1 month ago* (last edited 1 month ago)

Again, isn't that the site's prerogative?

I think there should at least be a recognized way to opt-out that archive.org actually follows. For years they told people to put

User-agent: ia_archiver
Disallow:

in robots.txt, but they still archived content from those sites. They refuse to publish what IP addresses they pull content down from, but that would be a trivial thing to do. They refuse to use a UserAgent that you can filter on.

If you want to be a library, be open and honest about it. There's no need to sneak around.