I'm not super familiar with the Archive Team - what distinguishes this specific archiving effort from the dataset that PushShift archived? Is this primarily focusing on archiving specifically media (video, iamges), or comments/submissions in the time period since PushShift closed, or everything from the entire time period from 2005 onward?
datahoarder
Who are we?
We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.
We are one. We are legion. And we're trying really hard not to forget.
-- 5-4-3-2-1-bang from this thread
Bump. We have the PushShift archive—what is this project archiving that we don’t already have?
This is such an awesome project and something I had been worried about with reddit dying a slow death. Thank you for the post and bringing awareness to the project!
Only 15 million items left out of over 11.5 billion!
I'm now seeing reports over on [email protected] that people who have deleted all the comments from their accounts - even those who did it years ago, not just in the past few weeks out of protest - are having all their comments reappear again. This apparently also includes comments that were overwritten with edits.
Scummy behaviour from Reddit, but a potential boon for archivists. People who are running backups or maintaining archives of Reddit comments might want to take this opportunity to re-check historical deleted comments to see if they can be collected now, in this remaining window of API accessibility.
I run it using Docker on my Unraid server. It is available on community applications so it is easy to setup.