this post was submitted on 09 Aug 2023
61 points (95.5% liked)
13435 readers
1 users here now
founded 5 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
It’s hard to say that without knowing what their infrastructure’s like, even if we think it’s expensive. And if they built their stack with OLAP being an important part of it, I don’t see why they wouldn’t have our comment edit histories stored somewhere that’s not a backup, and maybe they just toss dated database partitions into some cheap cold storage that allows for occasional, slow reads. They’re not gonna make a backup of their entire fleet of databases for every change that happens. That would be literally insane.
Also, tracking individual edit and delete rates over time isn’t expensive at all, especially if they just keep an incremental day-by-day, maybe more or less frequent, change over time. Or, just slap a counter for edits and deletes in a cache, reset that every day, and if either one goes higher than some threshold, look into it. There are probably many ways to achieve something similar in a cheap way.
And ChatGPT is just an example. I’m sure there already are other out-of-fashion-but-totally-usable language models or heuristics that are cheap to run and easy to use. Anything that can give a decent amount of confidence is probably good enough.
At the end of the day, the actual impact of their business from the API fiasco is just on a subset of power users and tech enthusiasts, which is vanishingly small. I know many that still use Reddit, some begrudgingly, despite knowing the news pretty well. Why? Cause the contents are already there. Restoring valuable content is important for Reddit, so I don’t see why they wouldn’t want to sink some money into ensuring that they keep what makes em future money. It’s basically an investment. There are some risks, but the chances to earn em back with returns on top of the cost is high.
what we can do is apply some common sense, however, and realize the amount of work to do this is ridiculous. and, yes, tacking the changes isn’t that complex, but tracking that many changes and storing them for tens of millions of users’ comments for 18 years IS. Then doing what you proposed with ChatGPT is beyond absurd with regards to cost, too, considering the scale of computing work required to process so many deleted comments.
so, despite how many theoreticals you propose regarding the possibility of it, the fact remains that it’s unlikely in the extreme such an effort would have been made because of the resource, time, and cost involved.
Kinda don’t like how my handwavy idea is just taken for the most naive turn. I’m not even trying to give precise solutions. I’ve never worked with software at scale, and I expect the playing ground to be pretty different, but I think you’re exaggerating.
Look buddy, all I want to say is that I don’t think your method against Reddit would work. It’s basically gamble though, so I’m definitely not against attempt at it. I just want to point out the possibility of it not working. I don’t think there are surefire ways against their attempt at restoring content.
I’m sorry you don’t like that I think you’re being ridiculous, but getting upset and doubling-down every time I say so isn’t likely to change my mind.
move on.