#2 seems to require #3 by definition -- the model can't know what spam is without knowing what ham is as well. In general a DSpam model would seem to be the right one -- all posts used to train ham, individual posts marked as spam are removed from the ham set and added to the spam set, and then a separate spam feed that could be monitored for false positives.
In general all of these approaches sound fine to me -- I hope that mastodon can develop a built-in spam suppression system but for now we have to rely on these bespoke approaches.
#2 seems to require #3 by definition -- the model can't know what spam is without knowing what ham is as well. In general a DSpam model would seem to be the right one -- all posts used to train ham, individual posts marked as spam are removed from the ham set and added to the spam set, and then a separate spam feed that could be monitored for false positives.
In general all of these approaches sound fine to me -- I hope that mastodon can develop a built-in spam suppression system but for now we have to rely on these bespoke approaches.