this post was submitted on 24 Jun 2023
33 points (100.0% liked)

datahoarder

6632 readers
2 users here now

Who are we?

We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.

We are one. We are legion. And we're trying really hard not to forget.

-- 5-4-3-2-1-bang from this thread

founded 4 years ago
MODERATORS
 

It would be a shame to lose the wealth of knowledge with easy-ish search that subreddits like datahoarder provide if the subreddit is taken down or stays locked forever. Sure it is currently accessible, but will it stay that way?

I know it is being archived, but the accessibility part is the problem.

top 13 comments
sorted by: hot top controversial new old
[–] yakabuff 6 points 1 year ago* (last edited 1 year ago) (1 children)

If you're just interested in searching:

http://redarc.basedbin.org/search

/r/datahoarder is indexed and searchable

[–] [email protected] 3 points 1 year ago

That is good to know that exists, thanks! Although I still personally believe it being in a forum like lemmy is the best way to preserve it in its original format.

[–] [email protected] 6 points 1 year ago (1 children)

There’s actually already a Reddit to Lemmy importer that lets you bring threads including comments https://github.com/rileynull/RedditLemmyImporter

[–] [email protected] 2 points 1 year ago

Yeah, that appears to be what I had in mind. Good find!

[–] [email protected] 5 points 1 year ago (1 children)

I have wondered is there an easy way to perform search through wayback machine for archived reddit data?

And for comments people back up to csv with stuff like power suite delete is there a nice way that displays them as opposed to excel?

[–] [email protected] 7 points 1 year ago* (last edited 1 year ago) (2 children)

It could be done, but that really isn't the best possible solution in my opinion. What I was thinking was having a bot migrate all the comments and posts here (or another instance). So the bot would take all the names of the users and replace them with the bot's names (instead of trying to create new users on lemmy) and put the old usernames in their comment. Like "Bread commented" and their comment. So we know who said it still.

If the bot maker had control of the instance, we probably might be able to put everything in chronological order by timestamp. So it would look like the comments were all made here orginally. The only indicator it wasn't would be the bot name as the username. So search algorithms would be able to search it just like reddit.

I believe the best way to archive a forum style website, would be on a forum where things have one to one equals.

As for moving Datahoarder to a new instance, that sure would make backups a lot nicer if a datahoarder ran it. I am surprised that it isn't on its own already considering the topic. Same thing with self-hosted.

[–] thawed_caveman 1 points 1 year ago

I'm not sure if it's possible to retrodate posts, not even if it's your own instance. But otherwise i think this might be the way.

[–] OutrageousUmpire 1 points 1 year ago (1 children)

I love this idea. It raises some issues to think about, too. Like, who “owns” that data? Would Reddit file a lawsuit against the Lemmy instance arguing that the data belongs to Reddit? Does the data belong to the users who posted? What TOS do we agree to when signing up for a Reddit account? Are we giving them ownership of all content we post?

[–] [email protected] 1 points 1 year ago

I think it would be very hard to argue in court that someone's ideas and thoughts that they made belong to reddit just because they posted them there. That is also why you can request reddit delete all your data and they must comply.

As for the legality of taking those comments and posts. I don't know for certain. The internet archive already does though. If anything, they would have to remove any content that a person wants removed that they made. Like a DCMA request.

Like with most things on the internet, if it is illegal and nobody is enforcing it, it might as well be legal.

[–] venoft 4 points 1 year ago

The-eye has a nice archive of Reddit: https://the-eye.eu/redarcs/

[–] nydas 1 points 1 year ago (1 children)

While it’s easy enough to get the sub contents in json, I don’t believe there is a post API for lemmy yet, so no way to easily push it back up.

[–] [email protected] 1 points 1 year ago (1 children)

But it is still possible. The question now is should it be done?

[–] nydas 1 points 1 year ago

Hmmm…. I’d be more inclined to ensure it was archived with the wayback machine, and then refer back to that.

load more comments
view more: next ›