this post was submitted on 09 Aug 2023
61 points (95.5% liked)
13435 readers
1 users here now
founded 5 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
it was randomly-generated letters and numbers. it would be impossible to divine what te original comment was. I then did this, over and over 10 times, so the edit history was overwritten with blocks of randomized text.
what you suggest would just spit out more garbage, or, at best, completely fake comments.
You misunderstood my comment. Reddit probably has every version of your edits, so all they need to do is to put all your past comments through ChatGPT or something, by time in descending order. The first sensible one gets accepted. In some sense, that’s just like how a person would do it. This way, they don’t have to deal with individual approaches to obfuscating or messing with their data.
I was gonna just wait till this whole fiasco dies down, let it sit for a couple of months to a year, before going ahead and slowly remove my comments over time. It’s easy to build triggers for individual users to detect attempts at mass edit or mass deletion of comments after all, which may trigger some process in their systems. Doing it the low profile way is likely the best way to go.
the amounts of cost and resources for all of that would be profound. when they’re already complaining about profitability, I doubt they’d dumb huge amounts of additional funds into a project like that. they clearly have at least one level of backups, and I wouldn’t be shocked if they had 2 or 3 revision backups, but anything past that - let alone what you’re suggesting - would be too much to be a manageable cost.
It’s hard to say that without knowing what their infrastructure’s like, even if we think it’s expensive. And if they built their stack with OLAP being an important part of it, I don’t see why they wouldn’t have our comment edit histories stored somewhere that’s not a backup, and maybe they just toss dated database partitions into some cheap cold storage that allows for occasional, slow reads. They’re not gonna make a backup of their entire fleet of databases for every change that happens. That would be literally insane.
Also, tracking individual edit and delete rates over time isn’t expensive at all, especially if they just keep an incremental day-by-day, maybe more or less frequent, change over time. Or, just slap a counter for edits and deletes in a cache, reset that every day, and if either one goes higher than some threshold, look into it. There are probably many ways to achieve something similar in a cheap way.
And ChatGPT is just an example. I’m sure there already are other out-of-fashion-but-totally-usable language models or heuristics that are cheap to run and easy to use. Anything that can give a decent amount of confidence is probably good enough.
At the end of the day, the actual impact of their business from the API fiasco is just on a subset of power users and tech enthusiasts, which is vanishingly small. I know many that still use Reddit, some begrudgingly, despite knowing the news pretty well. Why? Cause the contents are already there. Restoring valuable content is important for Reddit, so I don’t see why they wouldn’t want to sink some money into ensuring that they keep what makes em future money. It’s basically an investment. There are some risks, but the chances to earn em back with returns on top of the cost is high.
what we can do is apply some common sense, however, and realize the amount of work to do this is ridiculous. and, yes, tacking the changes isn’t that complex, but tracking that many changes and storing them for tens of millions of users’ comments for 18 years IS. Then doing what you proposed with ChatGPT is beyond absurd with regards to cost, too, considering the scale of computing work required to process so many deleted comments.
so, despite how many theoreticals you propose regarding the possibility of it, the fact remains that it’s unlikely in the extreme such an effort would have been made because of the resource, time, and cost involved.
Kinda don’t like how my handwavy idea is just taken for the most naive turn. I’m not even trying to give precise solutions. I’ve never worked with software at scale, and I expect the playing ground to be pretty different, but I think you’re exaggerating.
Look buddy, all I want to say is that I don’t think your method against Reddit would work. It’s basically gamble though, so I’m definitely not against attempt at it. I just want to point out the possibility of it not working. I don’t think there are surefire ways against their attempt at restoring content.
I’m sorry you don’t like that I think you’re being ridiculous, but getting upset and doubling-down every time I say so isn’t likely to change my mind.
move on.