I made a robot moderator. It models trust flow through a network that's made of voting patterns, and detects people and posts/comments that are accumulating a large amount of "negative trust," so to speak.

In its current form, it is supposed to run autonomously. In practice, I have to step in and fix some of its boo-boos when it makes them, which happens sometimes but not very often.

I think it's working well enough at this point that I'd like to experiment with a mode where it can form an assistant to an existing moderation team, instead of taking its own actions. I'm thinking about making it auto-report suspect comments, instead of autonomously deleting them. There are other modes that might be useful, but that might be a good place to start out. Is anyone interested in trying the experiment in one of your communities? I'm pretty confident that at this point it can ease moderation load without causing many problems.

[email protected]

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 2 points 3 weeks ago* (last edited 3 weeks ago) (1 children)

(edit: I accidentally a word and didn't realize you wrote 'auto-report instead of deleting them'. Read the following with a grain of salt)

I've played (briefly) with automated moderation bots on forums, and the main thing stopping me from going much past known-bad profiles (e.g. visited the site from a literal spamlist) is not just false positives but malicious abuse. I wanted to add a feature which would censor an image immediately with a warning if it was reported for (say) porn, shock imagery or other extreme content, but if a user noticed this, they could falsely report content to censor it until a staff member dismisses the report.

Could an external brigade of trolls get legitimate users banned or their posts hidden just by gaming your bot? That's a serious issue which could make real users have their work deleted, and in my experience, users can take that very personally.

[–] [email protected] 1 points 3 weeks ago (1 children)

It's possible. I think it's more difficult than people think. You have to do it on a scale which is blatantly obvious to anyone who's looking, so you're just inviting a ban.

One person swore to me that it would be really easy, so I invited them to try, and they made a gang of bots which farmed karma and then mass-downvoted me, trying to get me banned from my own place. If you look at my profile you'll see some things which have -300 score because of it. I welcomed the effort, since I'm interested in how well it will resist that kind of attack. Their first effort did exactly nothing, because none of the downvote bots had any rank within the algorithm. I gave them some pointers on how they could improve for a second time around, and they went radio silent and I haven't heard from them since then.

[–] [email protected] 1 points 2 weeks ago

Haha they thought it was too easy and were proven wrong!

Honestly, if a place is obscure enough, even smaller barriers of entry help, like forums that don't let you post on important boards until you build a reputation. There's only so much effort an adversary is willing to put in, and if there isn't a financial incentive or huge political incentive, that barrier could be low.