this post was submitted on 25 Jun 2023

165 points (100.0% liked)

No Stupid Questions

37019 readers

796 users here now

No such thing. Ask away!

!nostupidquestions is a community dedicated to being helpful and answering each others' questions on various topics.

The rules for posting and commenting, besides the rules defined here for lemmy.world, are as follows:

Rules (interactive)

Rule 1- All posts must be legitimate questions. All post titles must include a question.

All posts must be legitimate questions, and all post titles must include a question. Questions that are joke or trolling questions, memes, song lyrics as title, etc. are not allowed here. See Rule 6 for all exceptions.

Rule 2- Your question subject cannot be illegal or NSFW material.

Your question subject cannot be illegal or NSFW material. You will be warned first, banned second.

Rule 3- Do not seek mental, medical and professional help here.

Do not seek mental, medical and professional help here. Breaking this rule will not get you or your post removed, but it will put you at risk, and possibly in danger.

Rule 4- No self promotion or upvote-farming of any kind.

That's it.

Rule 5- No baiting or sealioning or promoting an agenda.

Questions which, instead of being of an innocuous nature, are specifically intended (based on reports and in the opinion of our crack moderation team) to bait users into ideological wars on charged political topics will be removed and the authors warned - or banned - depending on severity.

Rule 6- Regarding META posts and joke questions.

Provided it is about the community itself, you may post non-question posts using the [META] tag on your post title.

On fridays, you are allowed to post meme and troll questions, on the condition that it's in text format only, and conforms with our other rules. These posts MUST include the [NSQ Friday] tag in their title.

If you post a serious question on friday and are looking only for legitimate answers, then please include the [Serious] tag on your post. Irrelevant replies will then be removed by moderators.

Rule 7- You can't intentionally annoy, mock, or harass other members.

If you intentionally annoy, mock, harass, or discriminate against any individual member, you will be removed.

Likewise, if you are a member, sympathiser or a resemblant of a movement that is known to largely hate, mock, discriminate against, and/or want to take lives of a group of people, and you were provably vocal about your hate, then you will be banned on sight.

Rule 8- All comments should try to stay relevant to their parent content.

Rule 9- Reposts from other platforms are not allowed.

Let everyone have their own content.

Rule 10- Majority of bots aren't allowed to participate here.

Credits

Our breathtaking icon was bestowed upon us by @Cevilia!

The greatest banner of all time: by @TheOneWithTheHair!

founded 2 years ago

MODERATORS

165

What kind of data do Lemmy instances store about their users? (self.nostupidquestions)

submitted 2 years ago by klappscheinwerfer to c/nostupidquestions

37 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] solrize 1 points 2 years ago* (last edited 2 years ago) (1 children)

Tracking of reads = when you read someone's post, there is a permanent record made, e.g. in a db row associated with the user, that @aski3252 read that post. That is somewhat different from normal httpd access logs that associate only with IP addresses and which typically get distilled down to aggregate data, and prefeferably discarded after a short period. Where I worked, we kept logs around for 30 days for stuff like abuse investigations but deleted them after that. In fact with a little careful design of the log data, or if the query is sent by HTTP POST instead of GET, the parameters that identify what you were reading will usually not be logged at all.

It's not mostly an issue of selling data for marketing purposes. The data could also be extracted by cyber attackers, seized by law enforcement, subpoenad in a lawsuit, or whatever. The only way to stop that from happening is to not retain the data in the first place. "Marketing purposes" is a smoke screen anyway. E.g. if you are a regular lurker on a community about workplace organizing or job hunting, that info will be more valuable to your boss than it will be to some advertiser or marketer. So the real customers of internet usage data (and phone records etc.) are far less benign than "marketing" organizations.

It is not necessary to record voting data except to prevent you from voting twice on a particular topic. So if voting closes (say on a poll), all the data about who voted in it can be deleted. There is also no need to remember HOW anyone voted. It's enough to remember that you voted on a particular topic, and increment the relevant vote counter. That is also how real-world elections work. See also the topic of "receipt-free voting" in cryptography.

I agree with you that if you actually publish something on the site, there is a certain amount of disclosure unavoidably associated with that.

[–] aski3252 1 points 2 years ago (1 children)

First of all, just to be clear, I'm not at all an expert on this topic for those who haven't noticed. My questions are mostly because I want to learn how it works, not because I want to tell you that you are wrong or anything like that. You seem to know a lot more than me anyway.

Tracking of reads = when you read someone’s post, there is a permanent log record made

When you read someone's post, you first need to access that information from the server. In order to do that, your client tells the server which post you want to see and the server sends you that post. Those interactions are most likely logged on the server as well as which IP address has requested that information, etc. There is no absolute sure way to make sure that the admin does not use those logs to extract that information, at the end of the day, it comes down if you trust the admin.

But there is also a "show read posts" option which seems to hide read posts overall, which does indeed suggest that read posts are saved and used and which seems to work independent of client.

It’s not mostly an issue of selling data for marketing purposes. The data could also be extracted by cyber attackers, seized by law enforcement, subpoenad in a lawsuit, or whatever.

Sure, I do get the issue to some extend, but I don't see how it is quite as bad as you seem to imply. For example, I worry more about personal data, such as my e-mail address being leaked, which is why I generally use a throw away email. I don't really see why I, or some attacker, should care about which posts I have "read", but maybe I don't understand the full implications getting this information means.

“Marketing purposes” is a smoke screen anyway.

Of course it is, but I don't think there are any lemmy instances that use lemmy data for marketing purposes. Data seems to be used only to improve the user experience, at least that's how it's intended.

It is not necessary to record voting data except to prevent you from voting twice on a particular topic.

If it wasn't logged or only logged client side you could upvote/downvote infinitely, no?

There is also no need to remember HOW anyone voted. It’s enough to remember that you voted on a particular topic, and increment the relevant vote counter. That is also how real-world elections work. See also the topic of “receipt-free voting” in cryptography.

That does seem to be a good point.

[–] solrize 1 points 2 years ago* (last edited 2 years ago) (1 children)

Yes, I understand how web servers work (I have implemented them) ;-). I've also been involved in abuse investigations that involved crunching of 100s of GB of raw logs. If I wanted to figure out what posts you had read based on raw http logs, it would be a big pain in the neck involving matching your user ID with IP addresses, and trying to match HTTP queries with posts, where the relevant log entries were scattered through billions of similar entries from other people. Last time I did something like that, the analysis took about 15 hours on a quite big server, though that particular task also had to find groups of queries corresponding to login sessions. While if there's a database table that identifies every post that has been read by every user, all I have to do is type some SQL and the info comes up immediately.

As for the invasiveness of that info, don't you have any private life at all? Are you pro-XYZ about some political question while your boss is rabidly anti-XYZ? You probably don't want him to know what you're reading. Same if you're getting sued by someone trying to dig up dirt on you, or if you are running for some kind of office (look at all the NSFW content aski3252 reads on Lemmy! Sinner!!!, etc). Or say you are in a country where some dictator gets into power and decides to round up all the Star Trek fans. You suspected something like that was coming, so you carefully avoided posting in the Star Trek communities, but unfortunately you were reading them and now you have been found out. Just use your imagination ;).

Re voting, let's say there is a poll "Is Spez an idiot? Vote yes/no, poll closes on July 1", and you vote in it. To stop you from voting twice, the server must remember until july 1 that you voted, but not how you voted. After July 1, it is impossible to vote again, so the info that you voted at all can be deleted. What currently happens instead seems to be that "aski3252 voted yes" is retained forever. There are some minor UI benefits to that, so I described it as iffy rather than outright evil. If it were up to me though, I would minimize the amount of info kept.

[–] faltuuser 1 points 2 years ago

Very Informative 🧵