EURO 2024 Copenhagen
Abstract Submission

EURO-Online login

1137. Guiding labeling efforts in question difficulty estimation using active learning

Invited abstract in session WB-31: Learning Analytics and other Text Analytics tasks, stream Analytics.

Wednesday, 10:30-12:00
Room: 046 (building: 208)

Authors (first author is the speaker)

1. Arthur Thuy
Data Analytics, Ghent University
2. Dries Benoit
Ghent University

Abstract

Estimating the difficulty of exam questions is crucial for effectively evaluating students’ knowledge and facilitating personalized exercise recommendations. Obtaining labels for the training dataset typically involves time-consuming and expensive pretesting and manual calibration. Despite recent advances in fine-tuned Transformer-based models outperforming traditional machine learning approaches, the labeling expenses remain considerable. Our study addresses this labeling challenge by leveraging active learning to drive the annotation process, directing human expert attention toward the most informative data points. Given the lack of uncertainty in standard regression neural networks, we employ Monte Carlo Dropout to capture model uncertainty in predictions on the unlabeled set. Model uncertainty tends to be high on data points in underrepresented areas of the input space, precisely the observations we aim to label. Fine-tuning a DistilBERT model with Monte Carlo Dropout on a dataset comprising science and math multiple-choice questions yields promising results.

Keywords

Status: accepted


Back to the list of papers