Kids vs AI: building real-time content moderation

Eliška Houdková

Jul 3, 2026

A teacher and three primary-school children looking at a laptop screen together in a classroom.

Children are excellent at finding loopholes. Give them a shared screen, the power to put any word on it, and ten seconds without supervision, and they will immediately start testing the limits of the system.

We saw this play out firsthand while working on Skoala, a financial education platform developed together with Nadace České spořitelny. One of Skoala’s classroom features lets students join lesson activities through a QR code and send answers directly into a live presentation projected by the teacher.

The feature worked great. The children worked even better.

The challenge: Keeping the screen clean

In a typical classroom session using Skoala, teachers ask questions, students answer from their phones, and the responses appear in real time on the classroom screen. Most of the answers are thoughtful, funny, and smart. Others are not exactly presentation-friendly.

As Nadace České spořitelny started preparing for a nationwide roadshow with large live audiences, it became clear that the moderation system needed an upgrade. The existing GPT-based profanity filter was built during the MVP phase and caught roughly 69% of inappropriate words. Good enough for internal testing, but not nearly enough for live events with hundreds of people in the audience.

We also had to rethink how we displayed these answers. We wanted to group similar responses into a live word cloud, so that if 70 students shared their ideas, the teacher would see a clean summary of the 20 most common ones. In the MVP phase, we relied on another LLM prompt for this, but it struggled with consistency. Similar answers often ended up split apart, while completely unrelated ones occasionally landed in the same group.

We had three weeks to improve both systems before the roadshow started.

Illustration of a hand holding a phone on skoala.com, showing the question "What is the most common cashless payment method?" with three joke answers a student has entered: "Your mum", a poop emoji, and "Greed is good".

How do you make an AI profanity filter reliable enough?

Our goal was to reach a point where teachers could trust the system completely. We realised that relying on a single large language model wasn't enough. Instead, we built a multi-layered defence.

The dictionary and the first line of defence

Every answer first passes through a dictionary filter. This is a massive database containing nearly 90,000 terms identified by our content team.

To catch anyone trying to bypass the filter, the system automatically creates several variations of each response. It removes duplicate characters, ignores punctuation, and decodes “leetspeak”, where children use numbers to replace letters. By checking multiple variations, we ensure that a word isn't simply hidden behind a slight modification. If any of these alternatives match a term in our database, the answer is discarded immediately.

Gemini as the context expert

Words that pass the dictionary move on to Gemini Flash. We chose this model after testing several options using a tool called Promptfoo, which allowed us to run thousands of real student answers against different models. Gemini Flash reached 92% success in identifying inappropriate content, and while it performed slightly worse compared to high-end models (such as Gemini Pro or the latest GPT), it was roughly ten times more cost-effective for this scale.

The model works with a specific set of instructions to keep it as objective as possible. It understands that context matters—if a teacher asks what harms a person’s life, a word like drugs is a perfectly valid answer. If the question is about positive motivation, that same word would be flagged.

Organising the thoughts with K-means clustering

Once we had clean answers, we needed to display them through clusters to avoid cluttering the screen with repetitive phrases.

Diagram showing three similar text answers — "card", "debit card", "card payment" — on the left, an arrow, and a clustered word cloud of payment methods on the right, with "Debit card" and "Bank transfer" largest among options like PayPal, Apple Pay, Google Pay and QR payment.

To achieve this with more precision than an LLM could provide, we implemented an algorithm called K-means clustering.

First, the system turns every answer into a long string of numbers—a process called semantic embedding. This allows the computer to understand the meaning of the word mathematically. In this numerical space, words with similar meanings land close to each other. The K-means algorithm then finds the centre of these groups and selects one representative word, called a centroid, to display on the slide.

What happens when the internet drops mid-lesson?

A classroom doesn't wait for a slow API or a lost internet connection. We built in several safety nets. If Gemini is unavailable, the system automatically switches to GPT. If even that fails, we fall back to the dictionary filter alone to ensure the presentation never stops. We also added rate limiting to protect the system from being overwhelmed.

The result of those three weeks of intense work is a production-grade solution that Nadace České spořitelny has showcased on a roadshow in eight cities and that is used every day in classrooms across the country. Teachers in hundreds of classrooms can now focus on teaching, while the technology handles the creative attempts of young testers in the background.

If you're shipping AI features into a high-stakes environment and want a partner who builds the safety nets as well as the model calls, talk to us—or read how we built Skoala end-to-end.

Quick answers

How accurate can an AI profanity filter be?

Our layered approach reaches over 92% accuracy on inappropriate content. A single LLM prompt managed only about 69%; the gain comes from combining a deterministic dictionary, a context-aware model, and fallbacks.

Why pair a dictionary filter with an LLM?

The dictionary catches known profanities instantly, with no latency or cost, including disguised variants like leetspeak. The LLM handles the harder, context-dependent cases the dictionary can't judge.

Which AI model did you use for content moderation?

We benchmarked several models in Promptfoo and chose Gemini: it matched the accuracy we needed at a fraction of the cost of higher-end models at this scale.

How do you group open-text answers in real time?

We use K-means clustering on semantic embeddings, which collapses up to 180 answers into the 20 most common themes and surfaces one representative word per cluster.

How do you keep an AI moderation system running during live events?

Through fallbacks and rate limiting: if the primary model is unavailable, the system falls back to a second model, then to the dictionary alone, so the presentation never stops.