Our Research.

State of the Art Small-Scale Learning

QWK measures agreement between two graders, accounting for the magnitude of disagreements. Being off by one point is much better than being off by three. QWK also considers whether agreement is meaningful or coincidental: if 70% of essays have been graded 4/6, QWK would be near zero for a model predicting 4/6 100% of the time.

The chart compares performance on the AES 2.0 Kaggle Competition. The winning solution achieved 0.84 QWK after training on 1,700+ essays. Edexia achieved 0.81 QWK while training on only 20 essays.

85× Less Training Data

Traditional AI grading systems require thousands of pre-graded essays to train. Edexia's approach achieves comparable accuracy with just 20 examples, making AI grading practical for individual teachers and small schools.

1.000.900.800.700.60
0.84
0.81
Standard ML
1,700 essays
Edexia
20 essays

Our Background

Our team brings research experience from leading institutions in education, machine learning, and assessment science.

Harvard UniversityHarvard University
University of CambridgeUniversity of Cambridge
University of QueenslandUniversity of Queensland
University of Technology SydneyUniversity of Technology Sydney
International Olympiad in InformaticsInternational Olympiad in Informatics

Small-Scale Learning

Conventional AI systems require hundreds to thousands of datapoints to achieve reliability. But that volume of data rarely exists at the class or school level.

To work in real educational settings, AI grading systems must adapt from as few as 5 samples and become reliable after 50.

November 2, 2025

Comparison is Key

Saying ‘Essay A is better than Essay B’ is easier than assigning exact grades, for both humans and machines. This unlocks higher reliability when training data is scarce.

Read article
February 27, 2026

Teaching AI Like We Teach Humans

Teachers learn to grade from a handful of exemplars, discussion, and ongoing calibration. LLMs can do the same.

Read article