this post was submitted on 05 Jul 2023
43 points (68.7% liked)

Fediverse

27829 readers
328 users here now

A community to talk about the Fediverse and all it's related services using ActivityPub (Mastodon, Lemmy, KBin, etc).

If you wanted to get help with moderating your own community then head over to [email protected]!

Rules

Learn more at these websites: Join The Fediverse Wiki, Fediverse.info, Wikipedia Page, The Federation Info (Stats), FediDB (Stats), Sub Rehab (Reddit Migration), Search Lemmy

founded 1 year ago
MODERATORS
 

The fediverse is discussing if we should defederate from Meta's new Threads app. Here's why I probably won't (for now).

(Federation between plume and my lemmy instance doesn't work correctly at the moment, otherwise I would have made this a proper crosspost)

you are viewing a single comment's thread
view the rest of the comments
[–] leraje 5 points 1 year ago (1 children)

Sure, Zuck isn't going to give two shits that you and I might defederate from Threads and maybe it is just a gesture but I still think it's one worth making. The crux of it is - do they care enough about getting the data from .world or .helios42.de to go to the trouble of building a scraper? If they don't, then defederating is the right thing to do, in my opinion. If they do then you're right, its pointless.

[–] [email protected] -3 points 1 year ago* (last edited 1 year ago) (1 children)

I'll tell you a secret: they care enough to scrape everything. Not only the fediverse, every single website that's accessible. And that's not a thing for the future, that has been a reality at least since google became popular. Do yourself a favor and look into the server logs of an average webhost and you will find a whole bunch of crawlers. Some are for search engines, some are for other purposes.

I wrote my M. Sc. thesis on specialized crawlers (back in 2015) and you wouldn't believe how much research has gone into that and how effective modern crawlers are at finding every single thing that ever got uploaded to the net. The only thing needed is enough hardware to throw at the problem and that's exactly what Meta, Google, Microsoft, Amazon and all the others have. As a rule of thumb, if archive.org or your favorite search engine has indexed it, everyone else has it as well or has access to someone they can buy it from. There is no such thing as unscraped content on the internet (unless you lock it behind access restrictions and those would apply just the same to federation).

Edit: I don't have access logs enabled on my instance and obviously can't see what happens on other instances but I would bet that this very thread will be picked up by at least five different crawlers before the day is over.

[–] leraje 1 points 1 year ago

Yeah, I know. My own access logs on all the VS I have control over are disabled. I still feel something, even if that something is purely symbolic, is better than nothing.