this post was submitted on 26 Jun 2023
15 points (94.1% liked)

Lemmy

2172 readers
18 users here now

Everything about Lemmy; bugs, gripes, praises, and advocacy.

For discussion about the lemmy.ml instance, go to [email protected].

founded 4 years ago
MODERATORS
 

I recently came across a torrent that seems to be an archive of Reddit. It got me thinking if it would be possible to make it locally browsable. However, I also considered the possibility that someone might have already addressed this by creating a public Lemmy instance, enabling the content to be accessible from any federated instance.

top 6 comments
sorted by: hot top controversial new old
[–] V4uban 5 points 1 year ago (1 children)
[–] AlmightySnoo 1 points 1 year ago (1 children)

That thing is just hideous. I prefer letting Lemmy grow organically and make its own content without having bots mirror thousands of threads from Reddit. If individual Redditors want to bring their own content that's cool, but to have a bot automatically scrape Reddit like that is a bad idea.

[–] V4uban 2 points 1 year ago

I agree with you, but I guess it still fits OP's request

[–] T156 1 points 1 year ago

I don't think so. There's a lot of Reddit posts, and mirroring them all would take quite a while.

If you felt like it, you could probably import that into an instance of your own, but that would not be a small amount of involvement. You also have problems with things like comments, since you're either having to create a local replica of the relevant accounts and all of that on top.

None of which is a small amount of work, and might be more work than it's worth, when you could either mirror new content live with bots, or simply create new content over on Lemmy for less. It'll take some time to get established, but it is much less work and troubleshooting, by comparison.

[–] [email protected] -4 points 1 year ago (1 children)

Honestly it upsets me enough when I see people or bots mirroring new Reddit posts to fedi without the original author's permission. A full archive - whether in the form of a torrent or a fedi instance - also makes me feel icky.

I know it's not possible and it's entirely against reddit's interests, but I wish there were a way for subreddits or people or posts to be marked somehow as not for copying or use elsewhere.

It has always weirded me out when I found /r/relationships posts copy-pasted to like BuzzFeed knock-off sites. Then yesterday I saw and blocked a Lemmy bot mirroring like a dozen reddit subs (including gonewild) to its instance.

It may be fine, good, and useful to archive like how-to content or technical support questions and stuff like that as there is a clear utility there. But seeing the more personal stuff that people might not want to see copied around or searchable makes me feel bad.

Yes, yes I know it's the internet and these people should know better and if they really want to opt out they should submit a request to the wayback machine and set a robotstxt plus there's no way to stop it and we really really need all of this valuable information preserved for historical purposes and as we all know information wants to be free and you can't stop the signal. And all the myriad excuses that the less well behaved digital preservationists will lean on.

But at some point and in a lot of circumstances you're copying people's personal information and using it in ways they didn't intend on when they posted it. I don't know your personal opinion on the reports of reddit admins undeleting posts people have been deleting before they delete their accounts, but people who are upset about that should consider that "preserving" reddit data also takes away peoples' agency over their data and their right to be forgotten in much the same way.

[–] [email protected] 1 points 1 year ago

Are you a reddit employee?