One of the most deeply entrenched norms of the social media era is that content moderation is a black box with zero visibility into its decision-making process and absolutely no feedback to users until a post is taken down or an account suspended. Social platforms argue they cannot provide feedback because it would help bad actors know how close to the red line they can get, but thit raises the question of whether moderation should be today’s strictly punitive process or whether a better model would be a more educational system in which users were given feedback on each questionable post, allowing them to better understand where the limits are and to take steps to better conform their speech and behaviors to a platform’s rules before it is too late. Could social platforms start publishing public compliance scores for posts?
Today’s content moderation operates as a strictly punitive process, in which users who violate the rules see those posts deleted or their accounts suspended. In many ways it is the digital equivalent of the physical world’s justice system, with content moderators filling in for the courts and algorithms acting as the police that enforce those orders.
Yet the physical world is not strictly punitive. It features a lengthy sequence of educational processes, stretching at least a dozen years in the United States in which future members of society are taught the norms and rules of their society and are given continual feedback to help them comport with its expectations.
This educational system is entirely missing in today’s digital world.
Instead, we have a far more Orwellian world – a world in which the rules are in constant flux, are crafted and enforced in secret and in which users have no idea they are even approaching a violation until they are plucked from digital existence.
Social platforms argue they must enforce their rules in secret as otherwise it would help bad actors know how close to the line they can get without stepping over it. Yet this fails to acknowledge the fact that these lines are constantly changing, with the platforms moving their thresholds almost in real-time as they adapt to changing societal and political pressures. A perfectly allowable post this week might become prohibited content next week, while a post removed this week as a rules violation might be deemed a model example of an “good” post next week.
After all, the US court system operates entirely in the public eye, with trials and judicial sentences being publicly accessible in order to ensure transparency. There is little argument that having a published legal code and open trials encourages criminals by helping them see what kind of behavior they can get away with. So why do we tolerate it from our social media companies?
Given Silicon Valley’s obsession with gamification, one could actually argue that gamifying proper online conduct might have a very real impact.
Instead, what if every social media post that triggers a human or machine review is assigned a “compliance score” that assesses how well it comports with that platform’s acceptable speech guidelines?
Any time an algorithmic or human moderator flags a post as being borderline, it would receive a score indicating how close it came to being rejected and the area of the platform’s policy it encroaches. This process could be largely automated, with a categorical model used to determine the score and the justification for human moderator decisions listed beside it. This information is already being captured, so why shouldn’t it be displayed to users?
Adding such a score to reviewed posts would allow users to understand which of their posts generated concerns and how close they came to being rejected, helping them adjust their speech or helping them raise awareness of speech guidelines that need reforming. For example, a post calling for breaking up a platform over its privacy policies might be flagged as just shy of a policy violation of logo usage for its inclusion of the platform’s logo, helping the user either not use the logo in future or allowing them to bring together others who have received similar warnings to launch a formal protest to demand a rules change.
The end result for platforms would be a more informed public that has far greater transparency into the editorial decisions that affect them. For many users this feedback may help them be better communicators, while for others it could help push them away from destructive behaviors, much as our societal educational and intervention initiatives attempt to prevent people from entering the justice system. While there will always be bad actors, such feedback will also help others understand why a particular post was not removed.
Putting this all together, today’s content moderation is largely a black box built upon a secret and ever-changing list of rules, with companies arguing that any transparency risks helping bad actors. This is of course a false argument, given that a founding tenant of democracies is the notion of an open and transparent court system. Creating a new “compliance score” that helps users understand when their posts are creeping close to the edge of acceptability would help restore the educational component of traditional society that is missing from today’s digital society.
In the end, perhaps giving their users greater feedback would actually help them reduce their digital toxicity rather than encourage it.