Since, everything done on behalf of your instance is logged, detecting if you have a large number of bots, or invalid users isn't that challenging.
These queries can be executed via docker exec -it
, via remoting into the container, via pg query tools, or via pgadmin.
For listing all comments performed by users on your instance (This includes comments made remotely):
SELECT
p.actor_id
, p.name
, c.content as comment
FROM public.comment c
JOIN public.person p on p.id = c.creator_id
WHERE
p.local = 'true'
AND p.admin = 'false' -- Exclude Admins
;
For listing all posts created, by users, from your instance-
SELECT
p.actor_id
, c.name AS title
, c.body as body
FROM public.post c
JOIN public.person p on p.id = c.creator_id
WHERE
p.local = 'true'
AND p.admin = 'false' -- Exclude Admins
;
Lastly, here is a query to identify users who consistently upvotes or downvotes the same user over and over.
SELECT
p.id
, p.name
, p.actor_id
, cr.name as creator
, count(1)
FROM public.comment_like l
JOIN public.comment c on c.id = l.comment_id
JOIN public.person p on p.id = l.person_id
JOIN public.person cr on cr.id = c.creator_id
WHERE
p.id != cr.id
AND p.local = 'true'
AND p.admin = 'false' -- Exclude Admins
GROUP BY p.id, p.name, p.actor_id, cr.name
ORDER BY count(1) desc
;
If- anyone has idea of other queries which can be created for detecting suspicious activity, please LMK.
Edit- added where clause to exclude admins. If your admins are spambots, you have bigger issues to worry about.
Yes, person table is top level. Delete from person table and it cascades down and deletes from other tables. User count also automatically updates. Just be careful because person table also contains federated users. There is a "local" column to determine if they are local users or not.
I had about 6k bot accounts, but they were all unverified, so I just deleted all local unverified accounts from the person table.
Just don't go messing with the database without backups. My host supports snapshots so I did a quick snapshot before messing with anything.
Ty. I have full disk, and literally one real user with one comment. Re-subbing would be the only annoying thing lol
Remember- any federated content is also stored on your database- Part of how this platform works.
If- you don't have much disk space- you might consider joining a larger instance.
(Also, you CAN clean up the activity table daily too)
Edit- I do have a kubernetes CRD which handles automatically purging the activity table, for data older than a couple days.
Apologies for the confusion, I meant I have full disk access.