this post was submitted on 14 Jan 2025
26 points (100.0% liked)

Fuck AI

1680 readers
25 users here now

"We did it, Patrick! We made a technological breakthrough!"

A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.

founded 10 months ago
MODERATORS
 

I was scouring the indie-web earlier, and found a pretty useful list of bots to add to your robots.txt. But, since I'm not convinced that this is enough to keep them away, I also figured out a simple way to at least potentially completely block them from accessing your websites.

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 15 points 6 days ago (4 children)

Again it must always be stressed that this is a false sense of security. You can only block crawlers that identify themselves, or by pulling an IP block list of offenders which means they've already offended in order to be identified and they can just change their IP address.

You can't block them, but you can make their life harder. Return 200 OK on 404 Not Found so malicious bots trying to drive-by you for random URLs like /admin or whatever will think they found something. Make honeypots that redirect and loop, filled with bait wordlists and forms that go nowhere. Poison to well. Deliberately serve incorrect, broken or AI-generated data to known bots.

Waste their time, instead of wasting your own time.

[–] ultrahamster64 3 points 6 days ago (1 children)

Hmm, how would one attempt to actually do this in practice?

[–] [email protected] 4 points 6 days ago* (last edited 6 days ago)

Eventually I'm gonna make a proper article about it, but what I'm doing right now boils down to this:

  • Intercept 404
  • Redirect to error-hole.php
  • error-hole.php returns 200 and spits out a bunch of bot-targets

The next iteration of this will include a lot of uncompressed filler data so hopefully the bots have to download half a gigabyte of data every time they do this. I'm not paying for bandwidth, it doesn't matter to me.

See for yourself https://drkt.eu/fdhasklfh

I can see that it works by just looking at my access logs.

load more comments (2 replies)