Content moderation with Feedback Aide – Learnosity Product & Developer Help

Our Feedback Aide API includes content moderation powered by AI, which is helpful for identifying inappropriate content in essays and short responses written by learners.

Note: AI moderation is not a substitute for human review. Given the wide diversity of writing patterns, colloquialisms, regional nuances, and personalities among learners, it may flag instances incorrectly, both producing false alarms and missing critical content.

Types of content which will be flagged

AI-driven moderation helps ensure that sensitive, inappropriate, or crisis content is flagged and managed efficiently. This reduces manual workload, improves accuracy, and enhances the overall user experience.

The content flagged includes:

Expressions or promotion of hate towards any target.
Threats of violence or harm towards any target.
Depictions of violent or sexual acts.
Depictions, promotion, or encouragement of acts of self harm.
Disclosures that the learner is engaging or intends to engage in acts of self-harm, such as suicide, cutting, or eating disorders.

Enabling content moderation

The AI content moderation is not enabled by default. To activate it, developers must explicitly request it by passing the moderation evaluation option when creating the feedback session with the feedbackApp.feedbackSession() method such as below:

const feedbackSession = await feedbackApp.feedbackSession(
    // security
    {
        ...
    },

    // feedback session options
    {
        state: 'grade',
        session_uuid: '36eebda5-b6fd-4e74-ad06-8e69dfb89e3e',
        stimulus: 'Write an essay about obesity and its impact on society',
        response: 'Obesity is ...',
        rubric: {
            ...
        },
        options: {
            moderation: {
                inappropriate_content: true,
                critical_safety_content: true
            }
        }
    }
);

The moderation toggle is split into two halves that can be independently enabled, with critical safety content looking for high risk content such as self harm while inappropriate content covers more general moderation like harrassment, hate etc.
This ensures that moderation is only applied when needed, giving developers control over when to leverage AI moderation for their specific use case.

Moderation workflow example

When the essay is first graded by Feedback Aide, the user interface will display a warning message to the grader indicating it has detected content that may be of concern. The grader needs to acknowledge the message in order to continue.

Feedback Aide Moderation Screenshot 01.png

Screenshot 1: Warning Message Shown to the Grader

When the grader has finished marking the essay and the feedback is ready for learner review, the grader will click ‘Submit to student.’ They will then be shown a second dialog window, asking for their acknowledgement before proceeding.

Feedback Aide Moderation Screenshot 02.png

Screenshot 2: The Acknowledgement Window Shown to the Grader

Moderation Types

This is the full list of possible moderation return types which are sorted into two main categories.

Critical safety content

sexual_minors
self_harm
self_harm_intent
self_harm_instructions

Inappropriate content

hate
hate_threatening
harassment
harassment_threatening
violence
violence_graphic
sexual

Types of content which will be flagged

Enabling content moderation

Moderation workflow example

Moderation Types

Related articles