Our Feedback Aide API includes content moderation powered by AI, which is helpful for identifying inappropriate content in essays and short responses written by learners.
Note: AI moderation is not a substitute for human review. Given the wide diversity of writing patterns, colloquialisms, regional nuances, and personalities among learners, it may flag instances incorrectly, both producing false alarms and missing critical content.
Types of content which will be flagged
AI-driven moderation helps ensure that sensitive, inappropriate, or crisis content is flagged and managed efficiently. This reduces manual workload, improves accuracy, and enhances the overall user experience.
The content flagged includes:
- Expressions or promotion of hate towards any target.
- Threats of violence or harm towards any target.
- Depictions of violent or sexual acts.
- Depictions, promotion, or encouragement of acts of self harm.
- Disclosures that the learner is engaging or intends to engage in acts of self-harm, such as suicide, cutting, or eating disorders.
Enabling content moderation
The AI content moderation is not enabled by default. To activate it, developers must explicitly request it by passing the moderation evaluation option when creating the feedback session with the feedbackApp.feedbackSession() method such as below:
const feedbackSession = await feedbackApp.feedbackSession(
// security
{
...
},
// feedback session options
{
state: 'grade',
session_uuid: '36eebda5-b6fd-4e74-ad06-8e69dfb89e3e',
stimulus: 'Write an essay about obesity and its impact on society',
response: 'Obesity is ...',
rubric: {
...
},
options: {
moderation: {
inappropriate_content: true,
critical_safety_content: true
}
}
}
);The moderation toggle is split into two halves that can be independently enabled, with critical safety content looking for high risk content such as self harm while inappropriate content covers more general moderation like harrassment, hate etc.
This ensures that moderation is only applied when needed, giving developers control over when to leverage AI moderation for their specific use case.
Moderation workflow example
When the essay is first graded by Feedback Aide, the user interface will display a warning message to the grader indicating it has detected content that may be of concern. The grader needs to acknowledge the message in order to continue.
Screenshot 1: Warning Message Shown to the Grader
When the grader has finished marking the essay and the feedback is ready for learner review, the grader will click ‘Submit to student.’ They will then be shown a second dialog window, asking for their acknowledgement before proceeding.
Screenshot 2: The Acknowledgement Window Shown to the Grader
Moderation Types
This is the full list of possible moderation return types which are sorted into two main categories.
Critical safety content
sexual_minorsself_harmself_harm_intentself_harm_instructions
Inappropriate content
hatehate_threateningharassmentharassment_threateningviolenceviolence_graphicsexual