Intelligent Grading and Feedback Systems
Our Research Focus

AI Precision

Solving the unreliability and bias of generative AI for assessment applications.

AI Safety

Contributing to the field of AI safety and discussions around ethics and regulation.

Teacher Workload

Determining teachers' biggest pain points and measuring the impact of interventions.

Student Feedback

Applying foundational pedagogical literature with new regularity and consistency

Metrics We Use

Small-Scale Learning

Conventional AI systems require hundreds to thousands of datapoints to achieve reliability. However, such data volumes are rarely available at the class or school level.

To become viable in real educational settings, AI grading systems must strive to adapt in as few as 5 samples, and become reliable after 50.

Coming Soon
Teaching AI Like We Teach Humans

Teachers learn to grade by discussing exemplars with colleagues, comparing submissions to only a handful of anchors, and updating their understanding of the rubric as they go. LLMs can do the same.

Human Versus Machine

All machine learning systems require human data as the 'ground truth' for training and evaluation. But what happens when that ground truth is flawed, or there is no reliable ground truth?

Exploring inter-rater reliability of both humans and machines uncovers fundamental questions about what 'accuracy' truly means in grading assessment.

Coming Soon
Taking Each At Their Best

Under ideal conditions, expert human raters can be found to reach 0.95 QWK, yet on other datasets, modern systems are now exceeding the inter-rater reliability of humans.

November 11, 2025
The Reliability of Human Judgement

When two trained raters disagree slightly on 35% of essays, which dataset should AI learn from? When the goal is to match a single teacher's grades, how can we tell if the teacher was consistent?

Explore a Partnership

We partner with K12 educational institutions across the globe. If you think your institution would be a good fit, please submit an expression of interest.